Fast Database Restarts at Facebook

ACM Special Interest Group on Management of Data (SIGMOD)


Facebook engineers query multiple databases to monitor and analyze Facebook products and services. The fastest of these databases is Scuba, which achieves subsecond query response time by storing all of its data in memory across hundreds of servers. We are continually improving the code for Scuba and would like to push new software releases at least once a week. However, restarting a Scuba machine clears its memory. Recovering all of its data from disk — about 120 GB per machine — takes 2.5-3 hours to read and format the data per machine. Even 10 minutes is a long downtime for the critical applications that rely on Scuba, such as detecting user-facing errors. Restarting only 2% of the servers at a time mitigates the amount of unavailable data, but prolongs the restart duration to about 12 hours, during which users see only partial query results and one engineer needs to monitor the servers carefully. We need a faster, less engineer intensive, solution to enable frequent software upgrades.

In this paper, we show that using shared memory provides a simple, effective, fast, solution to upgrading servers. Our key observation is that we can decouple the memory lifetime from the process lifetime. When we shutdown a server for a planned upgrade, we know that the memory state is valid (unlike when a server shuts down unexpectedly). We can therefore use shared memory to preserve memory state from the old server process to the new process. Our solution does not increase the server memory footprint and allows recovery at memory speeds, about 2-3 minutes per server. This solution maximizes uptime and availability, which has led to much faster and more frequent rollouts of new features and improvements. Furthermore, this technique can be applied to the in-memory state of any database, even if the memory contains a cache of a much larger disk-resident data set, as in most databases.

Related Publications

All Publications

Coordinated Priority-aware Charging of Distributed Batteries in Oversubscribed Data Centers

Sulav Malla, Qingyuan Deng, Zoh Ebrahimzadeh, Joe Gasperetti, Sajal Jain, Parimala Kondety, Thiara Ortiz, Debra Vieira

MICRO - October 17, 2020

11-Gbps Broadband Modem-Agnostic Line-of-Sight MIMO Over the Range of 13 km

Yan Yan, Pratheep Bondalapati, Abhishek Tiwari, Chiyun Xia, Andy Cashion, Dawei Zhang, Tobias Tiecke, Qi Tang, Michael Reed, Dudi Shmueli, Hongyu Zhou, Bob Proctor, Joseph Stewart

IEEE GLOBECOM - January 21, 2019

Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems

Maxim Naumov, John Kim, Dheevatsa Mudigere, Srinivas Sridharan, Xiaodong Wang, Whitney Zhao, Serhat Yilmaz, Changkyu Kim, Hector Yuen, Mustafa Ozdal, Krishnakumar Nair, Isabel Gao, Bor-Yiing Su, Jiyan Yang, Mikhail Smelyanskiy

arXiv - September 3, 2020

PyTorch Distributed: Experiences on Accelerating Data Parallel Training

Shen Li, Yanli Zhao, Rohan Verma, Omkar Salpekar, Pieter Noordhuis, Teng Li, Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, Soumith Chintala

VLDB - August 31, 2020

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy