Core Systems

Distributed systems for a large-scale geo-replicated infrastructure

About Core Systems

Facebook Core Systems researchers and engineers design and build the distributed systems that power Facebook’s infrastructure. Our work spans across the engineering spectrum of research, development, deployment, and production as we ensure that our systems run efficiently, reliably, and securely across millions of machines in tens of geo-replicated data center regions.

Core Systems performs forward-looking research in the area of distributed systems and architecture at a global scale. Billions of people rely on the services we build and manage to connect and communicate. Throughout the lifecycle of these distributed services, we encounter fundamental research challenges in multiple areas, including capacity management, configuration management, cluster management, deployment, distributed tracing, efficiency, fault tolerance, monitoring, performance, power management, reliability, routing, scalability, service discovery, and storage systems.

We build a strong collaboration pipeline with key experts in academia through Distributed Systems PhD fellowships, requests for proposals, faculty summits, as well as internships and visiting researcher programs.

In recent years, we’ve published work on cluster management (Twine, OSDI 2020), configuration management (Configerator, SOSP 2015), fault tolerance (Kraken, OSDI 2016; Maelstrom, OSDI 2018; Taiji, SOSP 2019), tracing (Canopy, SOSP 2017), data center power management (Dynamo, ISCA 2016), and consensus protocol (Delos, OSDI 2020). View our Publications for a list of all our published research.

Latest Publications

All Publications

Virtual Consensus in Delos

Mahesh Balakrishnan, Jason Flinn, Chen Shen, Mihir Dharamshi, Ahmed Jafri, Xiao Shi, Santosh Ghosh, Hazem Hassan, Aaryaman Sagar, Rhed Shi, Jingming Liu, Filip Gruszczynski, Xianan Zhang, Huy Hoang, Ahmed Yossef, Francois Richard, Yee Jiun Song

OSDI - November 4, 2020

Twine: A Unified Cluster Management System for Shared Infrastructure

Chunqiang (CQ) Tang, Kenny Yu, Kaushik Veeraraghavan, Jonathan Kaldor, Scott Michelson, Thawan Kooburat, Aravind Anbudurai, Matthew Clark, Kabir Gogia, Long Cheng, Ben Christensen, Alex Gartrell, Maxim Khutornenko, Sachin Kulkarni, Marcin Pawlowski, Tuomas Pelkonen, Andre Rodrigues, Rounak Tibrewal, Vaishnavi Venkatesan, Peter Zhang

OSDI - November 4, 2020

FlightTracker: Consistency across Read-Optimized Online Stores at Facebook

Xiao Shi, Scott Pruett, Kevin Doherty, Jinyu Han, Dmitri Petrov, Jim Carrig, John Hugg, Nathan Bronson

OSDI - November 4, 2020

Taiji: Managing Global User Traffic for Large-Scale Internet Services at the Edge

David Chou, Tianyin Xu, Kaushik Veeraraghavan, Andrew Newell, Sonia Margulis, Lin Xiao, Pol Mauri Ruiz, Justin Meza, Kiryong Ha, Shruti Padmanabha, Kevin Cole, Dmitri Perelman

SOSP - October 29, 2019