PyTorch Distributed: Experiences on Accelerating Data Parallel Training

International Conference on Very Large Databases (VLDB)


This paper presents the design, implementation, and evaluation of the PyTorch distributed data parallel module. PyTorch is a widely-adopted scientific computing package used in deep learning research and applications. Recent advances in deep learning argue for the value of large datasets and large models, which necessitates the ability to scale out model training to more computational resources. Data parallelism has emerged as a popular solution for distributed training thanks to its straightforward principle and broad applicability. In general, the technique of distributed data parallelism replicates the model on every computational resource to generate gradients independently and then communicates those gradients at each iteration to keep model replicas consistent. Despite the conceptual simplicity of the technique, the subtle dependencies between computation and communication make it non-trivial to optimize the distributed training efficiency. As of v1.5, PyTorch natively provides several techniques to accelerate distributed data parallel, including bucketing gradients, overlapping computation with communication, and skipping gradient synchronization. Evaluations show that, when configured appropriately, the PyTorch distributed data parallel module attains near-linear scalability using 256 GPUs.

Related Publications

All Publications

Inductive Sequentialization of Asynchronous Programs

Bernhard Kragl, Constantin Enea, Thomas A. Henzinger, Suha Orhun Mutluergil, Shaz Qadeer

PLDI - July 15, 2020

FastPay: High-Performance Byzantine Fault Tolerant Settlement

Mathieu Baudet, George Danezis, Alberto Sonnino

AFT - November 1, 2020

Refinement for Structured Concurrent Programs

Bernhard Kragl, Shaz Qadeer, Thomas A. Henzinger

CAV - July 15, 2020

An Exploration of Embodied Visual Exploration

Santhosh K. Ramakrishnan, Dinesh Jayaraman, Kristen Grauman

arXiv - August 21, 2020

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy