Publication

Realtime Data Processing at Facebook

ACM SIGMOD


Abstract

Realtime data processing powers many use cases at Facebook, including realtime reporting of the aggregated, anonymized voice of Facebook users, analytics for mobile applications, and insights for Facebook page administrators. Many companies have developed their own systems; we have a realtime data processing ecosystem at Facebook that handles hundreds of Gigabytes per second across hundreds of data pipelines.

Many decisions must be made while designing a realtime stream processing system. In this paper, we identify five important design decisions that affect their ease of use, performance, fault tolerance, scalability, and correctness. We compare the alternative choices for each decision and contrast what we built at Facebook to other published systems.

Our main decision was targeting seconds of latency, not milliseconds. Seconds is fast enough for all of the use cases we support and it allows us to use a persistent message bus for data transport. This data transport mechanism then paved the way for fault tolerance, scalability, and multiple options for correctness in our stream processing systems Puma, Swift, and Stylus.

We then illustrate how our decisions and systems satisfy our requirements for multiple use cases at Facebook. Finally, we reflect on the lessons we learned as we built and operated these systems.

Related Publications

All Publications

Coordinated Priority-aware Charging of Distributed Batteries in Oversubscribed Data Centers

Sulav Malla, Qingyuan Deng, Zoh Ebrahimzadeh, Joe Gasperetti, Sajal Jain, Parimala Kondety, Thiara Ortiz, Debra Vieira

MICRO - October 17, 2020

11-Gbps Broadband Modem-Agnostic Line-of-Sight MIMO Over the Range of 13 km

Yan Yan, Pratheep Bondalapati, Abhishek Tiwari, Chiyun Xia, Andy Cashion, Dawei Zhang, Tobias Tiecke, Qi Tang, Michael Reed, Dudi Shmueli, Hongyu Zhou, Bob Proctor, Joseph Stewart

IEEE GLOBECOM - January 21, 2019

Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems

Maxim Naumov, John Kim, Dheevatsa Mudigere, Srinivas Sridharan, Xiaodong Wang, Whitney Zhao, Serhat Yilmaz, Changkyu Kim, Hector Yuen, Mustafa Ozdal, Krishnakumar Nair, Isabel Gao, Bor-Yiing Su, Jiyan Yang, Mikhail Smelyanskiy

arXiv - September 3, 2020

PyTorch Distributed: Experiences on Accelerating Data Parallel Training

Shen Li, Yanli Zhao, Rohan Verma, Omkar Salpekar, Pieter Noordhuis, Teng Li, Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, Soumith Chintala

VLDB - August 31, 2020

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy