Explore the latest research from Facebook

All Publications

November 4, 2020 Xiao Shi, Scott Pruett, Kevin Doherty, Jinyu Han, Dmitri Petrov, Jim Carrig, John Hugg, Nathan Bronson
Paper

FlightTracker: Consistency across Read-Optimized Online Stores at Facebook

This paper introduces FlightTracker, a family of APIs and systems which now manage consistency for online access to Facebook’s graph. FlightTracker implicitly provides RYW and can be explicitly used to provide alternative consistency guarantees for special use cases; it enables flexible communication patterns between caches, which we have found important as the number of datacenters increases; it extends the same consistency guarantees to cross-shard indexes and materialized views, allowing us to transparently optimize queries; and it provides a uniform primitive for clients to obtain desired consistency guarantees across a variety of data stores.
Paper
November 4, 2020 Benjamin Berg, Daniel S. Berger, Sara McAllister, Isaac Grosof, Sathya Gunasekar, Jimmy Lu, Michael Uhlar, Jim Carrig, Nathan Beckmann, Mor Harchol-Balter, Gregory G. Ganger
Paper

The CacheLib Caching Engine: Design and Experiences at Scale

This paper presents a different approach to cache development, successfully employed at Facebook, which extracts a core set of common requirements and functionality from otherwise disjoint caching systems. CacheLib is a general-purpose caching engine, designed based on experiences with a range of caching use cases at Facebook, that facilitates the easy development and maintenance of caches.
Paper
October 30, 2020 James Thorne, Majid Yazdani, Marzieh Saeidi, Sebastian Riedel, Alon Halevy
Paper

Neural Database Operator Model

Our goal is to answer queries over facts stored in a text memory. The key challenge in NeuralDBs (Thorne et al., 2020), compared to open-book NLP such as question answering (Rajpurkar et al., 2016, inter alia), is that possibly thousands of facts must be aggregated to provide a single answer, without direct supervision.
Paper
October 9, 2020 Neta Barkay, Curtiss Cobb, Roee Eilat, Tal Galili, Daniel Haimovich, Sarah LaRocca, Katherine Morris, Tal Sarig
Paper

Weights and Methodology Brief for the COVID-19 Symptom Survey by University of Maryland and Carnegie Mellon University, in Partnership with Facebook

The Facebook company is partnering with academic institutions to support COVID-19 research and to help inform public health decisions. Currently, we are inviting Facebook app users in the United States to take a survey collected by faculty at Carnegie Mellon University (CMU) Delphi Research Center, and we are inviting Facebook app users in more than 200 countries or territories globally to take a survey collected by faculty at the University of Maryland (UMD) Joint Program in Survey Methodology.
Paper
August 31, 2020 Yoshinori Matsunobu, Siying Dong, Herman Lee
Paper

MyRocks: LSM-Tree Database Storage Engine Serving Facebook’s Social Graph

In this paper, we describe our journey to build and run an OLTP LSMtree SQL database at scale. We also discuss the features we implemented to keep pace with UDB workloads, what made migrations easier, and what operational and software development challenges we faced during the two years of running MyRocks in production.
Paper
June 8, 2020 Fred Lin, Keyur Muzumdar, Nikolay Laptev, Mihai-Valentin Curelea, Seunghak Lee, Sriram Sankar
Paper

Fast Dimensional Analysis for Root Cause Investigation in a Large-Scale Service Environment

In this paper we present a fast dimensional analysis framework that automates the root cause analysis on structured logs with improved scalability.
Paper
February 24, 2020 Zhichao Cao, Siying Dong, Sagar Vemuri, David H.C. Du
Paper

Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook

In this paper, we first present a detailed characterization of workloads from three typical RocksDB production use cases at Facebook: UDB (a MySQL storage layer for social graph data), ZippyDB (a distributed key-value store), and UP2X (a distributed key-value store for AI/ML services).
Paper
July 12, 2019 Amy Tai, Andrew Kryczka, Shobhit O. Kanaujia, Kyle Jamieson, Michael J. Freedman, Asaf Cidon
Paper

Who’s Afraid of Uncorrectable Bit Errors? Online Recovery of Flash Errors with Distributed Redundancy

In this paper, we present an approach for addressing the flash lifetime problem by allowing devices to operate at much higher bit error rates. We present DIRECT, a set of techniques that harnesses distributed-level redundancy to enable the adoption of new generations of denser and less reliable flash storage technologies. DIRECT does so by using an end-to-end approach to increase the reliability of distributed storage systems.
Paper
July 1, 2019 Dmitrii Avdiukhin, Sergey Pupyrev, Grigory Yaroslavtsev
Paper

Multi-Dimensional Balanced Graph Partitioning via Projected Gradient Descent

Motivated by performance optimization of large-scale graph processing systems that distribute the graph across multiple machines, we consider the balanced graph partitioning problem. Compared to most of the previous work, we study the multi-dimensional variant in which balance according to multiple weight functions is required.
Paper
April 23, 2018 Assaf Eisenman, Darryl Gardner, Islam AbdelRahman, Jens Axboe, Siying Dong, Kim Hazelwood, Chris Petersen, Asaf Cidon, Sachin Katti
Paper

Reducing DRAM Footprint with NVM in Facebook

In this work, we design a key-value store, MyNVM, which leverages an NVM block device to reduce DRAM usage, and to reduce the total cost of ownership, while providing comparable latency and queries-per-second (QPS) as MyRocks on a server with a much larger amount of DRAM. Replacing DRAM with NVM introduces several challenges. In particular, NVM has limited read bandwidth, and it wears out quickly under a high write bandwidth.
Paper