Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached

ACM/IEEE International Symposium on Computer Architecture (ISCA)


Distributed in-memory key-value stores, such as memcached, are central to the scalability of modern internet services. Current deployments use commodity servers with high-end processors. However, given the cost-sensitivity of internet services and the recent proliferation of volume low-power System-on-Chip (SoC) designs, we see an opportunity for alternative architectures.

We undertake a detailed characterization of memcached to reveal performance and power inefficiencies. Our study considers both high-performance and low-power CPUs and NICs across a variety of carefully-designed benchmarks that exercise the range of memcached behavior. We discover that, regardless of CPU microarchitecture, memcached execution is remarkably inefficient, saturating neither network links nor available memory bandwidth. Instead, we find performance is typically limited by the per-packet processing overheads in the NIC and OS kernel – long code paths limit CPU performance due to poor branch predictability and instruction fetch bottlenecks.

Our insights suggest that neither high-performance nor low-power cores provide a satisfactory power-performance trade-off, and point to a need for tighter integration of the network interface. Hence, we argue for an alternate architecture – Thin Servers with Smart Pipes (TSSP) – for cost-effective high-performance memcached deployment. TSSP couples an embedded-class low-power core to a memcached accelerator that can process GET requests entirely in hardware, offloading both network handling and data look up. We demonstrate the potential benefits of our TSSP architecture through an FPGA prototyping platform, and show the potential for a 6X-16X power-performance improvement over conventional server baselines.

Related Publications

All Publications

POPL - January 16, 2022

Concurrent Incorrectness Separation Logic

Azalea Raad, Josh Berdine, Derek Dreyer, Peter O'Hearn

HOTI - November 1, 2021

Scalable Distributed Training of Recommendation Models: An ASTRA-SIM + NS3 case-study with TCP/IP transport

Saeed Rashidi, Pallavi Shurpali, Srinivas Sridharan, Naader Hassani, Dheevatsa Mudigere, Krishnakumar Nair, Misha Smelyanskiy, Tushar Krishna

ICSE - November 17, 2021

Automatic Testing and Improvement of Machine Translation

Zeyu Sun, Jie M. Zhang, Mark Harman, Mike Papadakis, Lu Zhang

ACM OOPSLA - October 22, 2021

VESPA: Static Profiling for Binary Optimization

Angélica Aparecida Moreira, Guilherme Ottoni, Fernando Magno Quintão Pereira

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookie Policy