Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms

International Symposium on Computer Architecture (ISCA)


Deep Learning (DL) training platforms are built by interconnecting multiple DL accelerators (e.g., GPU/TPU) via fast, customized interconnects with 100s of gigabytes (GBs) of bandwidth. However, as we identify in this work, driving this bandwidth is quite challenging. This is because there is a pernicious balance between using the accelerator’s compute and memory for both DL computations and communication.

This work makes two key contributions. First, via real system measurements and detailed modeling, we provide an understanding of compute and memory bandwidth demands for DL compute and comms. Second, we propose a novel DL collective communication accelerator called Accelerator Collectives Engine (ACE) that sits alongside the compute and networking engines at the accelerator endpoint. ACE frees up the endpoint’s compute and memory resources for DL compute, which in turn reduces the required memory BW by 3.5× on average to drive the same network BW compared to state-of-the-art baselines. For modern DL workloads and different network sizes, ACE, on average, increases the effective network bandwidth utilization by 1.44× (up to 2.52×), resulting in average of 1.41× (up to 1.51×), 1.12× (up to 1.17×), and 1.13× (up to 1.19×) speedup in iteration time for ResNet-50, GNMT and DLRM when compared to the best baseline configuration, respectively.

Related Publications

All Publications

SIGDIAL - August 1, 2021

Annotation Inconsistency and Entity Bias in MultiWOZ

Kun Qian, Ahmad Berrami, Zhouhan Lin, Ankita De, Alborz Geramifard, Zhou Yu, Chinnadhurai Sankar

Uncertainty and Robustness in Deep Learning Workshop at ICML - August 1, 2020

Tilted Empirical Risk Minimization

Tian Li, Ahmad Beirami, Maziar Sanjabi, Virginia Smith

arxiv - November 1, 2020

The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet Singh, Pratik Ringshia, Davide Testuggine

ICML - July 24, 2021

Using Bifurcations for Diversity in Differentiable Games

Jonathan Lorraine, Jack Parker-Holder, Paul Vicol, Aldo Pacchiano, Luke Metz, Tal Kachman, Jakob Foerster

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy