October 4, 2010

Finding a needle in Haystack: Facebook’s photo storage

USENIX Symposium on Operating Systems Design and Implementation (OSDI)

By: Doug Beaver, Sanjeev Kumar, Harry Li, Jason Sobel, Peter Vajgel

Abstract

This paper describes Haystack, an object storage system optimized for Facebook’s Photos application. Facebook currently stores over 260 billion images, which translates to over 20 petabytes of data. Users upload one billion new photos (∼60 terabytes) each week and Facebook serves over one million images per second at peak.

Haystack provides a less expensive and higher performing solution than our previous approach, which leveraged network attached storage appliances over NFS. Our key observation is that this traditional design incurs an excessive number of disk operations because of metadata lookups.

We carefully reduce this per photo metadata so that Haystack storage machines can perform all metadata lookups in main memory. This choice conserves disk operations for reading actual data and thus increases overall throughput.