Explore the latest research from Facebook

All Publications

April 8, 2021 Kiwan Maeng, Shivam Bharuka, Isabel Gao, Mark C. Jeffrey, Vikram Saraph, Bor-Yiing Su, Caroline Trippel, Jiyan Yang, Mike Rabbat, Brandon Lucia, Carole-Jean Wu
Paper

CPR: Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery

The paper proposes and optimizes a partial recovery training system, CPR, for recommendation models. CPR relaxes the consistency requirement by enabling non-failed nodes to proceed without loading checkpoints when a node fails during training, improving failure-related overheads. The paper is the first to the extent of our knowledge to perform a data-driven, in-depth analysis of applying partial recovery to recommendation models and identified a trade-off between accuracy and performance.
Paper
March 31, 2021 Jie M. Zhang, Mark Harman
Paper

“Ignorance and Prejudice” in Software Fairness

Machine learning software can be unfair when making human-related decisions, having prejudices over certain groups of people. Existing work primarily focuses on proposing fairness metrics and presenting fairness improvement approaches. It remains unclear how key aspect of any machine learning system, such as feature set and training data, affect fairness. This paper presents results from a comprehensive study that addresses this problem.
Paper
March 31, 2021 Giovani Guizzo, Justyna Petke, Federica Sarro, Mark Harman
Paper

Enhancing Genetic Improvement of Software with Regression Test Selection

Genetic improvement uses artificial intelligence to automatically improve software with respect to non-functional properties (AI for SE). In this paper, we propose the use of existing software engineering best practice to enhance Genetic Improvement (SE for AI). We conjecture that existing Regression Test Selection (RTS) techniques (which have been proven to be efficient and effective) can and should be used as a core component of the GI search process for maximizing its effectiveness.
Paper
March 31, 2021 Wei Ma, Thierry Titcheu Chekam, Mike Papadakis, Mark Harman
Paper

MuDelta: Delta-Oriented Mutation Testing at Commit Time

We introduce MuDelta, an approach that identifies commit-relevant mutants; mutants that affect and are affected by the changed program behaviors. Our approach uses machine learning applied on a combined scheme of graph and vector-based representations of static code features. Our results, from 50 commits in 21 Coreutils programs, demonstrate a strong prediction ability of our approach; yielding 0.80 (ROC) and 0.50 (PR-Curve) AUC values with 0.63 and 0.32 precision and recall values.
Paper
March 3, 2021 Maksim Panchenko, Rafael Auler, Laith Sakka, Guilherme Ottoni
Paper

Lightning BOLT: Powerful, Fast, and Scalable Binary Optimization

Profile-guided binary optimization has proved to be an important technology to achieve peak performance, particularly for large-scale binaries that are typical for data-center applications. By applying the profile data at the same representation where sampling-based profiling is collected, binary optimizers can provide double-digit speedups over binaries compiled with profile-guided optimizations using similarly collected profile data.
Paper
March 3, 2021 Hyoukjun Kwon, Liangzhen La, Michael Pellauer, Tushar Krishna, Yu-Hsin Chen, Vikas Chandra
Paper

Heterogeneous Dataflow Accelerators for Multi-DNN Workloads

Alternatively, this work proposes a new class of accelerators, heterogeneous dataflow accelerators (HDAs), which deploy multiple accelerator substrates (i.e., sub-accelerators), each supporting a different dataflow. HDAs enable coarser-grained dataflow flexibility than RDAs with higher energy efficiency and lower area cost comparable to FDAs.
Paper
March 1, 2021 Guilherme Ottoni, Bin Liu
Paper

HHVM Jump-Start: Boosting Both Warmup and Steady-State Performance at Scale

This paper presents the Jump-Start mechanism implemented inside the HipHop Virtual Machine (HHVM). Jump-Start is a practical approach to share VM profile data at a large scale, being used to power one of the largest websites in the world.
Paper
February 27, 2021 Bilge Acun, Matthew Murphy, Xiaodong Wang, Jade Nie, Carole-Jean Wu, Kim Hazelwood
Paper

Understanding Training Efficiency of Deep Learning Recommendation Models at Scale

The goal of this paper is to explain the intricacies of using GPUs for training recommendation models, factors affecting hardware efficiency at scale, and learnings from a new scale-up GPU server design, Zion.
Paper
February 23, 2021 Satadru Pan, Theano Stavrinos, Yunqiao Zhang, Atul Sikaria, Pavel Zakharov, Abhinav Sharma, Shiva Shankar, Mike Shuey, Richard Wareing, Monika Gangapuram, Guanglei Cao, Christian Preseau, Pratap Singh, Kestutis Patiejunas, JR Tipton, Ethan Katz-Bassett, Wyatt Lloyd
Paper

Facebook’s Tectonic Filesystem: Efficiency from Exascale

This paper describes Tectonic’s design, explaining how it achieves scalability, supports multitenancy, and allows tenants to specialize operations to optimize for diverse workloads. The paper also presents insights from designing, deploying, and operating Tectonic.
Paper
February 22, 2021 Harish Dattatraya Dixit, Sneha Pendharkar, Matt Beadon, Chris Mason, Tejasvi Chakravarthy, Bharath Muthiah, Sriram Sankar
Paper

Silent Data Corruptions at Scale

In this paper, we describe common defect types observed in silicon manufacturing that leads to SDCs. We discuss a real-world example of silent data corruption within a datacenter application.
Paper