Publication

Accelerometer: Understanding Acceleration Opportunities for Data Center Overheads at Hyperscale

Architectural Support for Programming Languages and Operating Systems (ASPLOS)


Abstract

At global user population scale, important microservices in warehouse-scale data centers can grow to account for an enormous installed base of servers. With the end of Dennard scaling, successive server generations running these microservices exhibit diminishing performance returns. Hence, it is imperative to understand how important microservices spend their CPU cycles to determine acceleration opportunities across the global server feet. To this end, we first undertake a comprehensive characterization of the top seven microservices that run on the compute-optimized data center feet at Facebook.

Our characterization reveals that microservices spend as few as 18% of CPU cycles executing core application logic (e.g., performing a key-value store); the remaining cycles are spent in common operations that are not core to the application logic (e.g., I/O processing, logging, and compression). Accelerating such common building blocks can greatly improve data center performance. Whereas developing specialized hardware acceleration for each building block might be beneficial, it becomes risky at scale if these accelerators do not yield expected gains due to performance bounds precipitated by offload-induced overheads. To identify such performance bounds early in the hardware design phase, we develop an analytical model, Accelerometer, for hardware acceleration that projects realistic speedup in microservices. We validate Accelerometer’s utility in production using three retrospective case studies and demonstrate how it estimates the real speedup with ≤ 3.7% error. We then use Accelerometer to project gains from accelerating important common building blocks identified by our characterization.

Related Publications

All Publications

POPL - January 16, 2022

Concurrent Incorrectness Separation Logic

Azalea Raad, Josh Berdine, Derek Dreyer, Peter O'Hearn

HOTI - November 1, 2021

Scalable Distributed Training of Recommendation Models: An ASTRA-SIM + NS3 case-study with TCP/IP transport

Saeed Rashidi, Pallavi Shurpali, Srinivas Sridharan, Naader Hassani, Dheevatsa Mudigere, Krishnakumar Nair, Misha Smelyanskiy, Tushar Krishna

ICSE - November 17, 2021

Automatic Testing and Improvement of Machine Translation

Zeyu Sun, Jie M. Zhang, Mark Harman, Mike Papadakis, Lu Zhang

ACM OOPSLA - October 22, 2021

VESPA: Static Profiling for Binary Optimization

Angélica Aparecida Moreira, Guilherme Ottoni, Fernando Magno Quintão Pereira

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookie Policy