Accelerometer: Understanding Acceleration Opportunities for Data Center Overheads at Hyperscale

Architectural Support for Programming Languages and Operating Systems (ASPLOS)


At global user population scale, important microservices in warehouse-scale data centers can grow to account for an enormous installed base of servers. With the end of Dennard scaling, successive server generations running these microservices exhibit diminishing performance returns. Hence, it is imperative to understand how important microservices spend their CPU cycles to determine acceleration opportunities across the global server feet. To this end, we first undertake a comprehensive characterization of the top seven microservices that run on the compute-optimized data center feet at Facebook.

Our characterization reveals that microservices spend as few as 18% of CPU cycles executing core application logic (e.g., performing a key-value store); the remaining cycles are spent in common operations that are not core to the application logic (e.g., I/O processing, logging, and compression). Accelerating such common building blocks can greatly improve data center performance. Whereas developing specialized hardware acceleration for each building block might be beneficial, it becomes risky at scale if these accelerators do not yield expected gains due to performance bounds precipitated by offload-induced overheads. To identify such performance bounds early in the hardware design phase, we develop an analytical model, Accelerometer, for hardware acceleration that projects realistic speedup in microservices. We validate Accelerometer’s utility in production using three retrospective case studies and demonstrate how it estimates the real speedup with ≤ 3.7% error. We then use Accelerometer to project gains from accelerating important common building blocks identified by our characterization.

Related Publications

All Publications

MLSys - March 1, 2020

Predictive Precompute with Recurrent Neural Networks

Hanson Wang, Zehui Wang, Yuanyuan Ma

ACM SIGCOMM - October 26, 2020

Zero Downtime Release: Disruption-free Load Balancing of a Multi-Billion User Website

Usama Naseer, Luca Niccolini, Udip Pant, Alan Frindell, Ranjeeth Dasineni, Theophilus A. Benson

FL-ICML - September 1, 2020

ResiliNet: Failure-Resilient Inference in Distributed Neural Networks

Ashkan Yousefpour, Brian Q. Nguyen, Siddartha Devic, Guanhua Wang, Aboudy Kreidieh, Hans Lobel, Alexandre M. Bayen, Jason P. Jue

OSDI - November 4, 2020

The CacheLib Caching Engine: Design and Experiences at Scale

Benjamin Berg, Daniel S. Berger, Sara McAllister, Isaac Grosof, Sathya Gunasekar, Jimmy Lu, Michael Uhlar, Jim Carrig, Nathan Beckmann, Mor Harchol-Balter, Gregory G. Ganger

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy