Research Area
Year Published

136 Results

May 25, 2019

UBIS: Utilization-aware cluster scheduling

International Parallel and Distributed Processing Symposium (IPDPS)

Data center costs are among the major enterprise expenses, and any improvement in data center resource utilization corresponds to significant savings in true dollars. We focus on the problem of scheduling jobs in distributed execution environments to improve resource utilization.

By: Karthik Kambatla, Vamsee Yarlagadda, Íñigo Goiri, Ananth Grama

April 12, 2019

Presto: SQL on Everything

IEEE International Conference on Data Engineering (ICDE)

Presto is an open source distributed query engine that supports much of the SQL analytics workload at Facebook. Presto is designed to be adaptive, flexible, and extensible.

By: Raghav Sethi, Martin Traverso, Dain Sundstrom, David Phillips, Wenlei Xie, Yutian Sun, Nezih Yigitbasi, Haozhun Jin, Eric Hwang, Nileema Shingte, Christopher Berner

April 2, 2019

Bandana: Using Non-Volatile Memory for Storing Deep Learning Models

Conference on Systems and Machine Learning (SysML)

Typical large-scale recommender systems use deep learning models that are stored on a large amount of DRAM. These models often rely on embeddings, which consume most of the required memory. We present Bandana, a storage system that reduces the DRAM footprint of embeddings, by using Non-volatile Memory (NVM) as the primary storage medium, with a small amount of DRAM as cache.

By: Assaf Eisenman, Maxim Naumov, Darryl Gardner, Misha Smelyanskiy, Sergey Pupyrev, Kim Hazelwood, Asaf Cidon, Sachin Katti

April 1, 2019

PyTorch-BigGraph: A Large-scale Graph Embedding System

Conference on Systems and Machine Learning (SysML)

We present PyTorch-BigGraph (PBG), an embedding system that incorporates several modifications to traditional multi-relation embedding systems that allow it to scale to graphs with billions of nodes and trillions of edges.

By: Adam Lerer, Ledell Wu, Jiajun Shen, Luca Wehrstedt, Abhijit Bose, Alex Peysakhovich

February 20, 2019

BOLT: A Practical Binary Optimizer for Data Centers and Beyond

International Symposium on Code Generation and Optimization (CGO)

In this paper, we present BOLT, a post-link optimizer built on top of the LLVM framework. Utilizing sample-based profiling, BOLT boosts the performance of real-world applications even for highly optimized binaries built with both feedback-driven optimizations (FDO) and link-time optimizations (LTO).

By: Maksim Panchenko, Rafael Auler, Bill Nell, Guilherme Ottoni

February 16, 2019

Machine Learning at Facebook: Understanding Inference at the Edge

IEEE International Symposium on High-Performance Computer Architecture (HPCA)

This paper takes a data-driven approach to present the opportunities and design challenges faced by Facebook in order to enable machine learning inference locally on smartphones and other edge platforms.

By: Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, Tommer Leyvand, Hao Lu, Yang Lu, Lin Qiao, Brandon Reagen, Joe Spisak, Fei Sun, Andrew Tulloch, Peter Vajda, Xiaodong Wang, Yanghan Wang, Bram Wasti, Yiming Wu, Ran Xian, Sungjoo Yoo, Peizhao Zhang

February 13, 2019

SapFix: Automated End-to-End Repair at Scale

International Conference on Software Engineering (ICSE)

We report our experience with SAPFIX: the first deployment of automated end-to-end fault fixing, from test case design through to deployed repairs in production code.

By: Alexandru Marginean, Johannes Bader, Satish Chandra, Mark Harman, Yue Jia, Ke Mao, Alexander Mols, Andrew Scott

February 1, 2019

Separation Logic

Communications of the ACM (CACM)

In joint work with John Reynolds and others we developed Separation Logic as a formalism for reasoning about programs that mutate data structures. From a special logic for heaps it gradually evolved into a general theory for modular reasoning about concurrent as well as sequential programs.

By: Peter O'Hearn

January 14, 2019

A True Positives Theorem for a Static Race Detector

Principles of Programming Languages (POPL)

RacerD is a static race detector that has been proven to be effective in engineering practice: it has seen thousands of data races fixed by developers before reaching production, and has supported the migration of Facebook’s Android app rendering infrastructure from a single-threaded to a multi-threaded architecture.

By: Nikos Gorogiannis, Peter O'Hearn, Ilya Sergey

December 7, 2018

Rethinking floating point for deep learning

Systems for Machine Learning Workshop at NeurIPS 2018

We improve floating point to be more energy efficient than equivalent bit width integer hardware on a 28 nm ASIC process while retaining accuracy in 8 bits with a novel hybrid log multiply/linear add, Kulisch accumulation and tapered encodings from Gustafson’s posit format.

By: Jeff Johnson