Open Source Club Talk: DK Panda
DK Panda, Professor, The Ohio State University
Thursday, January 24th at 7:30pm
120 Caldwell Labs
How to Design Software Libraries for Top Supercomputers in the World?
Part III: Deep Learning
Significant growth has been witnessed during the last decade in High-Performance Computing (HPC) clusters with multi-/many-core processors, accelerators, and high-performance interconnects (such as InfiniBand, Omni-Path, iWARP, and RoCE). Many supercomputers in the world are currently being designed with commodity HPC clusters. The Network-Based Computing Laboratory (http://nowlab.cse.ohio-state.edu) at OSU/CSE in actively engaged in designing software libraries (HPC, Big Data, Deep Learning, and Cloud) for such supercomputers. An overview of these activities will be covered in four presentations. These presentations will also provide an outline of the associated research, publications, designs, testing and support framework for these libraries. Opportunities for students to get involved in the R&D activities in these projects will be outlined.
During the third talk of this academic year, we will focus on the High-Performance Deep Learning (HiDL) project (http://hidl.cse.ohio-state.edu). As a part of this project, high-performance designs for common Deep Learning frameworks such as TensorFlow and Caffe have been designed. We will start with an overview of interesting trends in Deep Neural Networks (DNN) design and how cutting-edge hardware architectures are playing a key role in moving the field forward. We will also present an overview of different DNN architectures and DL frameworks. Most DL frameworks started with a single-node/single-GPU design. However, approaches to parallelize the process of DNN training are also being actively explored. The DL community has moved along different distributed training designs that exploit communication runtimes like gRPC, MPI, and NCCL. In this context, we will highlight new challenges and opportunities for communication runtimes to efficiently support distributed DNN training. We also highlight some of our co-design efforts to utilize CUDA-Aware MPI for large-scale DNN training on modern GPU clusters.
The talk will follow with an open Q&A session with several members of the Network-Based Computing Laboratory. The session will conclude with a tour of the Laboratory consisting of multiple high-end clusters involving thousands of cores.
Bio: DK Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He has published over 450 papers in the area of high-end computing and networking. The MVAPICH2 (High Performance MPI and PGAS over InfiniBand, Omni-Path, iWARP and RoCE) libraries, designed and developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 2,950 organizations worldwide (in 86 countries). More than 518,000 downloads of this software have taken place from the project's site. This software is empowering several InfiniBand clusters (including the 3rd, 14th, 17th, and 27th ranked ones) in the TOP500 list. The RDMA packages for Apache Spark, Apache Hadoop, Memcached, HBase, and Kafka together with OSU HiBD benchmarks from his group (http://hibd.cse.ohio-state.edu) are also publicly available. These libraries are currently being used by more than 300 organizations in 35 countries. More than 28,900 downloads of these libraries have taken place. High-performance and scalable versions of the Caffe and TensorFlow frameworks are available from https://hidl.cse.ohio-state.edu. Prof. Panda is an IEEE Fellow. More details about Prof. Panda are available at http://www.cse.ohio-state.edu/~panda.