All Research Areas
Research Areas
Year Published

107 Results

March 15, 2016

Social Hash: an Assignment Framework for Optimizing Distributed Systems Operations on Social Networks

USINEX Symposium on Networked Systems Design and Implementation (NSDI 2016)

We describe the social hash framework, which uses graph partitioning techniques to improve the performance of systems within Facebook. We highlight two applications: 1. how routing similar users to the same web cluster improves our cache performance, 2. how co-locating socially similar data on the same host improves the performance of data serving systems.

By: Alon Shalita, Brian Karrer, Igor Kabiljo, Arun Sharma, Alessandro Presta, Aaron Adcock, Herald Kllapi, Michael Stumm
October 4, 2015

Existential Consistency: Measuring and Understanding Consistency at Facebook

The 25th ACM Symposium on Operating Systems Principles

Replicated storage for large Web services faces a trade-off between stronger forms of consistency and higher performance properties. Stronger consistency prevents anomalies, i.e., unexpected behavior visible to users, and reduces programming complexity.

By: Haonan Lu, Kaushik Veeraraghavan, Philippe Ajoux, Jim Hunt, Yee Jiun Song, Wendy Tobagus, Sanjeev Kumar, Wyatt Lloyd
October 4, 2015

Holistic Configuration Management at Facebook

The 25th ACM Symposium on Operating Systems Principles

This paper gives a comprehensive description of the use cases, design, implementation, and usage statistics of a suite of tools that manage Facebook’s configuration end-to-end, including the frontend products, backend systems, and mobile apps.

By: Chunqiang (CQ) Tang, Thawan Kooburat, Pradeep Venkat, Akshay Chander, Zhe Wen, Aravind Narayanan, Patrick Dowell, Robert Karl
August 31, 2015

One Trillion Edges: Graph Processing at Facebook-Scale

The 41st International Conference on Very Large Data Bases

Analyzing large graphs provides valuable insights for social networking and web companies in content ranking and recommendations. While numerous graph processing systems have been developed and evaluated on available benchmark graphs of up to 6.6B edges, they often face significant difficulties in scaling to much larger graphs. Industry graphs can be two orders of magnitude larger hundreds of billions or up to one trillion edges.

By: Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, Sambavi Muthukrishnan
August 31, 2015

Cubrick: A Scalable Distributed MOLAP Database for Fast Analytics

41st International Conference on Very Large Databases (Ph.D Workshop)

This paper describes the architecture and design of Cubrick, a distributed multidimensional in-memory database that enables real-time data analysis of large dynamic datasets. Cubrick has a strictly multidimensional data model composed of dimensions, dimensional hierarchies and metrics, supporting sub-second MOLAP operations such as slice and dice, roll-up and drill-down over terabytes of data.

By: Pedro Pedreira, Luis Erpen de Bona, Chris Croswhite
August 17, 2015

Inside the Social Network’s (Datacenter) Network

SIGCOMM ’15

Large cloud service providers have invested in increasingly larger datacenters to house the computing infrastructure required to support their services.

By: Arjun Roy, James Hongyi Zeng, Jasmeet Bagga, George Porter, Alex C. Snoeren
August 13, 2015

From Categorical Logic to Facebook Engineering

30th Annual ACM/IEEE Symposium on Logic in Computer Science

I chart a line of development from category-theoretic models of programs and logics to automatic program verification/analysis techniques that are in deployment at Facebook. Our journey takes in a number of concepts from the computer science logician’s toolkit – including categorical logic and model theory, denotational semantics, the Curry-Howard isomorphism, substructural logic, Hoare Logic and Separation Logic, abstract interpretation, compositional program analysis, the frame problem, and abductive inference.

By: Peter O'Hearn
June 22, 2015

Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field

IEEE/IFIP International Conference on Dependable Systems and Networks

In this paper, we analyze the memory errors in the entire fleet of servers at Facebook over the course of fourteen months, representing billions of device days.

By: Justin Meza, Qiang Wu, Sanjeev Kumar, Onur Mutlu
June 15, 2015

A Large-Scale Study of Flash Memory Failures in the Field

ACM Sigmetrics 2015

This paper presents the first large-scale study of flash-based SSD reliability in the field.

By: Justin Meza, Qiang Wu, Sanjeev Kumar, Onur Mutlu
May 19, 2015

Challenges to Adopting Stronger Consistency at Scale

Workshop on Hot Topics in Operating Systems

There have been many recent advances in distributed systems that provide stronger semantics for geo-replicated data stores like those underlying Facebook. At Facebook we are excited by these lines of research, but fundamental and operational challenges currently make it infeasible to incorporate these advances into deployed systems. This paper describes some of these challenges with the hope that future advances will address them.

By: Philippe Ajoux, Nathan Bronson, Sanjeev Kumar, Wyatt Lloyd, Kaushik Veeraraghavan