Storage and Performance Optimization of Long Tail Key Access in a Social Network

International Workshop on Cloud Data and Platforms (Cloud DP)


In a social network, it is natural to have hot objects such as a celebrity’s Facebook page. Duplicating hot object data in each cluster provides quick cache access and avoids stressing a single server’s network or CPU resources. But duplicating cold data in each cache cluster consumes significant RAM. A more storage efficient way is to separate hot data from cold data and duplicate only hot data in each cache cluster within a data center. The cold data, or the long tail data, which is accessed much less frequently, has only one copy at a regional cache cluster.

In this paper, a new sampling technique to capture all accesses to the same sampled keys is created. We then calculate the working set size for each key family for estimating the memory footprint. We introduce an important metric, duplication factor, as the ratio between the sum of each individual cluster’s working set size and the regional working set size. We analyze why some key families have a higher duplication factor.

It is important to separate hot keys and cold keys from the same key family with minimal overhead. We present a novel cache promotion algorithm based on key access probability. We also proposed a probability model based on the binomial distribution to predict the promotion probability with various promotion thresholds.

Our experiment shows by shrinking the cluster level cache layer and having a fat regional level cache for cold data, we are able to achieve a higher combined cache hit ratio.

Related Publications

All Publications

POPL - January 16, 2022

Concurrent Incorrectness Separation Logic

Azalea Raad, Josh Berdine, Derek Dreyer, Peter O'Hearn

HOTI - November 1, 2021

Scalable Distributed Training of Recommendation Models: An ASTRA-SIM + NS3 case-study with TCP/IP transport

Saeed Rashidi, Pallavi Shurpali, Srinivas Sridharan, Naader Hassani, Dheevatsa Mudigere, Krishnakumar Nair, Misha Smelyanskiy, Tushar Krishna

ICSE - November 17, 2021

Automatic Testing and Improvement of Machine Translation

Zeyu Sun, Jie M. Zhang, Mark Harman, Mike Papadakis, Lu Zhang

ACM OOPSLA - October 22, 2021

VESPA: Static Profiling for Binary Optimization

Angélica Aparecida Moreira, Guilherme Ottoni, Fernando Magno Quintão Pereira

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookie Policy