Research Area
Year Published

141 Results

July 29, 2019

Scaling Static Analyses at Facebook

Communications of the ACM (CACM)

Static analysis tools are programs that examine, and attempt to draw conclusions about, the source of other programs, without running them. At Facebook we have been investing in advanced static analysis tools that employ reasoning techniques similar to those from program verification.

By: Dino Distefano, Manuel Fahndrich, Francesco Logozzo, Peter O'Hearn

July 28, 2019

Learning to Optimize Halide with Tree Search and Random Programs


We present a new algorithm to automatically schedule Halide programs for high-performance image processing and deep learning. We significantly improve upon the performance of previous methods, which considered a limited subset of schedules.

By: Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, Jonathan Ragan-Kelley

July 12, 2019

Who’s Afraid of Uncorrectable Bit Errors? Online Recovery of Flash Errors with Distributed Redundancy

USENIX Annual Technical Conference (ATC)

In this paper, we present an approach for addressing the flash lifetime problem by allowing devices to operate at much higher bit error rates. We present DIRECT, a set of techniques that harnesses distributed-level redundancy to enable the adoption of new generations of denser and less reliable flash storage technologies. DIRECT does so by using an end-to-end approach to increase the reliability of distributed storage systems.

By: Amy Tai, Andrew Kryczka, Shobhit O. Kanaujia, Kyle Jamieson, Michael J. Freedman, Asaf Cidon

July 1, 2019

Multi-Dimensional Balanced Graph Partitioning via Projected Gradient Descent

International Conference on Very Large Databases (VLDB)

Motivated by performance optimization of large-scale graph processing systems that distribute the graph across multiple machines, we consider the balanced graph partitioning problem. Compared to most of the previous work, we study the multi-dimensional variant in which balance according to multiple weight functions is required.

By: Dmitrii Avdiukhin, Sergey Pupyrev, Grigory Yaroslavtsev

June 24, 2019

SoftSKU: Optimizing Server Architectures for Microservice Diversity @Scale

International Symposium on Computer Architecture (ISCA)

The variety and complexity of microservices in warehouse-scale data centers has grown precipitously over the last few years to support a growing user base and an evolving product portfolio. Despite accelerating microservice diversity, there is a strong requirement to limit diversity in underlying server hardware to maintain hardware resource fungibility, preserve procurement economies of scale, and curb qualification/test overheads.

By: Akshitha Sriraman, Abhishek Dhanotia, Thomas F. Wenisch

May 25, 2019

UBIS: Utilization-aware cluster scheduling

International Parallel and Distributed Processing Symposium (IPDPS)

Data center costs are among the major enterprise expenses, and any improvement in data center resource utilization corresponds to significant savings in true dollars. We focus on the problem of scheduling jobs in distributed execution environments to improve resource utilization.

By: Karthik Kambatla, Vamsee Yarlagadda, Íñigo Goiri, Ananth Grama

April 12, 2019

Presto: SQL on Everything

IEEE International Conference on Data Engineering (ICDE)

Presto is an open source distributed query engine that supports much of the SQL analytics workload at Facebook. Presto is designed to be adaptive, flexible, and extensible.

By: Raghav Sethi, Martin Traverso, Dain Sundstrom, David Phillips, Wenlei Xie, Yutian Sun, Nezih Yigitbasi, Haozhun Jin, Eric Hwang, Nileema Shingte, Christopher Berner

April 2, 2019

Bandana: Using Non-Volatile Memory for Storing Deep Learning Models

Conference on Systems and Machine Learning (SysML)

Typical large-scale recommender systems use deep learning models that are stored on a large amount of DRAM. These models often rely on embeddings, which consume most of the required memory. We present Bandana, a storage system that reduces the DRAM footprint of embeddings, by using Non-volatile Memory (NVM) as the primary storage medium, with a small amount of DRAM as cache.

By: Assaf Eisenman, Maxim Naumov, Darryl Gardner, Misha Smelyanskiy, Sergey Pupyrev, Kim Hazelwood, Asaf Cidon, Sachin Katti

April 1, 2019

PyTorch-BigGraph: A Large-scale Graph Embedding System

Conference on Systems and Machine Learning (SysML)

We present PyTorch-BigGraph (PBG), an embedding system that incorporates several modifications to traditional multi-relation embedding systems that allow it to scale to graphs with billions of nodes and trillions of edges.

By: Adam Lerer, Ledell Wu, Jiajun Shen, Luca Wehrstedt, Abhijit Bose, Alex Peysakhovich

February 20, 2019

BOLT: A Practical Binary Optimizer for Data Centers and Beyond

International Symposium on Code Generation and Optimization (CGO)

In this paper, we present BOLT, a post-link optimizer built on top of the LLVM framework. Utilizing sample-based profiling, BOLT boosts the performance of real-world applications even for highly optimized binaries built with both feedback-driven optimizations (FDO) and link-time optimizations (LTO).

By: Maksim Panchenko, Rafael Auler, Bill Nell, Guilherme Ottoni