Research Area
Year Published

147 Results

December 2, 2019

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Neural Information Processing Systems (NeurIPS)

In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.

By: Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, Soumith Chintala

November 30, 2019

MVFST-RL: An Asynchronous RL Framework for Congestion Control with Delayed Actions

Workshop on ML for Systems at NeurIPS

We present MVFST-RL, a scalable framework for congestion control in the QUIC transport protocol that leverages state-of-the-art in asynchronous RL training with off-policy correction. We analyze modeling improvements to mitigate the deviation from Markovian dynamics, and evaluate our method on emulated networks from the Pantheon benchmark platform.

By: Viswanath Sivakumar, Tim Rocktäschel, Alexander H. Miller, Heinrich Küttler, Nantas Nardelli, Mike Rabbat, Joelle Pineau, Sebastian Riedel

October 29, 2019

Taiji: Managing Global User Traffic for Large-Scale Internet Services at the Edge

Symposium on Operating Systems Principles (SOSP)

We present Taiji, a new system for managing user traffic for large-scale Internet services that accomplishes two goals: 1) balancing the utilization of data centers and 2) minimizing network latency of user requests.

By: David Chou, Tianyin Xu, Kaushik Veeraraghavan, Andrew Newell, Sonia Margulis, Lin Xiao, Pol Mauri Ruiz, Justin Meza, Kiryong Ha, Shruti Padmanabha, Kevin Cole, Dmitri Perelman

October 20, 2019

Optimizing and Evaluating Transient Gradual Typing

Dynamic Language Symposium

Gradual typing enables programmers to combine static and dynamic typing in the same language. However, ensuring sound interaction between the static and dynamic parts can incur runtime cost. In this paper, we analyze the performance of the transient design for gradual typing in Reticulated Python, a gradually typed variant of Python.

By: Michael M. Vitousek, Jeremy G. Siek, Avik Chaudhuri

September 30, 2019

Neural Code Search Evaluation Dataset


There has been an increase of interest in code search using natural language. Assessing the performance of such code search models can be difficult without a readily available evaluation suite. In this paper, we present an evaluation dataset of natural language query and code snippet pairs for future work. We also provide the results of two code search models ([6] and [1]) from recent work as a benchmark.

By: Hongyu Li, Seohyun Kim, Satish Chandra

August 15, 2019

Automated Hot Text and Huge Pages: An Easy-to-adopt Solution Towards High Performing Services

International Conference on Web Services (ICWS)

We have achieved CPU reduction by applying a solution that firstly identifies hot-text of the (software) binary and then places the binary on huge pages (i.e., 2MB+ memory pages). The solution is wrapped into an automated framework, enabling service owners to effortlessly adopt it.

By: Zhenyun Zhuang, Mark Santaniello, Shumin Zhao, Bikash Sharma, Rajit Kambo

July 29, 2019

Scaling Static Analyses at Facebook

Communications of the ACM (CACM)

Static analysis tools are programs that examine, and attempt to draw conclusions about, the source of other programs, without running them. At Facebook we have been investing in advanced static analysis tools that employ reasoning techniques similar to those from program verification.

By: Dino Distefano, Manuel Fahndrich, Francesco Logozzo, Peter O'Hearn

July 28, 2019

Learning to Optimize Halide with Tree Search and Random Programs


We present a new algorithm to automatically schedule Halide programs for high-performance image processing and deep learning. We significantly improve upon the performance of previous methods, which considered a limited subset of schedules.

By: Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, Jonathan Ragan-Kelley

July 12, 2019

Who’s Afraid of Uncorrectable Bit Errors? Online Recovery of Flash Errors with Distributed Redundancy

USENIX Annual Technical Conference (ATC)

In this paper, we present an approach for addressing the flash lifetime problem by allowing devices to operate at much higher bit error rates. We present DIRECT, a set of techniques that harnesses distributed-level redundancy to enable the adoption of new generations of denser and less reliable flash storage technologies. DIRECT does so by using an end-to-end approach to increase the reliability of distributed storage systems.

By: Amy Tai, Andrew Kryczka, Shobhit O. Kanaujia, Kyle Jamieson, Michael J. Freedman, Asaf Cidon

July 1, 2019

Multi-Dimensional Balanced Graph Partitioning via Projected Gradient Descent

International Conference on Very Large Databases (VLDB)

Motivated by performance optimization of large-scale graph processing systems that distribute the graph across multiple machines, we consider the balanced graph partitioning problem. Compared to most of the previous work, we study the multi-dimensional variant in which balance according to multiple weight functions is required.

By: Dmitrii Avdiukhin, Sergey Pupyrev, Grigory Yaroslavtsev