Filter by Research Area
Filter by Research Area
Year Published

116 Results

May 7, 2013

Facebook’s Data Center Network Architecture

IEEE Optical Interconnects Conference (OI)

We review Facebook’s current data center network architecture and explore some alternative architectures.

By: Nathan Farrington, Alexey Andreyev
May 1, 2013

Hash Bit Selection: a Unified Solution for Selection Problems in Hashing

Conference on Computer Vision and Pattern Recognition (CVPR)

Hashing based methods recently have been shown promising for large-scale nearest neighbor search. However, good designs involve difficult decisions of many unknowns – data features, hashing algorithms, parameter settings, kernels, etc.

By: Xianglong Liu, Junfeng He, Bo Lang, Shih-Fu Chang
April 26, 2013

Storage and Performance Optimization of Long Tail Key Access in a Social Network

International Workshop on Cloud Data and Platforms (Cloud DP)

In a social network, it is natural to have hot objects such as a celebrity’s Facebook page. Duplicating hot object data in each cluster provides quick cache access and avoids stressing a single server’s network or CPU resources. But duplicating cold data in each cache cluster consumes significant RAM. A more storage efficient way is to separate hot data from cold data and duplicate only hot data in each cache cluster within a data center. The cold data, or the long tail data, which is accessed much less frequently, has only one copy at a regional cache cluster.

By: John Liang, Yang 'James' Luo, Mark Drayton, Rajesh Nishtala, Richard Liu, Nick Hammer, Jason Taylor, Bill Jia
April 3, 2013

Scaling Memcache at Facebook

USENIX Symposium on Networked Systems Design and Implementation (NSDI)

Memcached is a well known, simple, in memory caching solution. This paper describes how Facebook leverages memcached as a building block to construct and scale a distributed key-value store that supports the world’s largest social network.

By: Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry Li, Ryan McElroy, Michael Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, Venkat Venkataramani
February 6, 2013

Characterizing and Curating Conversation Threads: Expansion, Focus, Volume, Re-entry

ACM International Conference on Web Search and Data Mining (WSDM)

Discussion threads form a central part of the experience on many Web sites, including social networking sites such as Facebook and Google Plus and knowledge creation sites such as Wikipedia.

By: Lars Backstrom, Jon Kleinberg, Lillian Lee, Cristian Danescu-Niculescu-Mizil
October 24, 2012

The HipHop Compiler for PHP

ACM International Conference on Object Oriented Programming Systems, Languages, and Applications (OOPSLA)

Scripting languages are widely used to quickly accomplish a variety of tasks because of the high productivity they enable. Among other reasons, this increased productivity results from a combination o…

By: Haiping Zhao, Minghui Yang, Xin Qi, Mark Williams, Charlie Gao, Guilherme Ottoni, Drew Paroski, Scott MacVicar, Jason Evans, Stephen Tu
August 16, 2012

Workload Analysis of a Large-Scale Key-Value Store

ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS)

Key-value stores are a vital component in many scale-out enterprises, including social networks, online retail, and risk analysis. Accordingly, they are receiving increased attention from the research community in an effort to improve their performance, scalability, reliability, cost, and power consumption. To be effective, such efforts require a detailed understanding of realistic key-value workloads. And yet little is known about these workloads outside of the companies that operate them. This paper aims to address this gap.

By: Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, Michael Paleczny
August 13, 2012

DeTail: Reducing the Flow Completion Time Tail in Datacenter Networks

ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM)

Web applications have now become so sophisticated that rendering a typical page may require hundreds of intra-datacenter flows. At the same time, web sites must meet strict page creation deadlines of 200-300ms to satisfy user demands for interactivity. Long-tailed flow completion times make it challenging for web sites to meet these constraints. They are forced to choose between rendering a subset of the complex page, or delay its rendering, thus missing deadlines and sacrificing either quality or responsiveness. Either option leads to potential financial loss.

By: Dhruba Borthakur, Randy Katz, Prashanth Mohan, Tathagata Das, David Zats
August 12, 2012

Active Sampling for Entity Matching

ACM Conference on Knowledge Discovery and Data Mining (KDD)

In entity matching, a fundamental issue while training a classifier to label pairs of entities as either duplicates or non-duplicates is the one of selecting informative examples. Although active learning presents an attractive solution to this problem, previous approaches minimize the misclassification rate (0-1 loss) of the classifier, which is an unsuitable metric for entity matching due to class imbalance (i.e., many more non-duplicate pairs than duplicate pairs).

By: Kedar Bellare, Suresh Iyengar Parthasarathy, Aditya Parameswaran, Vibhor Rastogi
June 17, 2012

Storage Infrastructure Behind Facebook Messages: Using HBase at Scale

IEEE International Conference on Data Engineering (ICDE)

Facebook Messages, which combines messages, chat and email into a real-time conversation, is the first application in Facebook to use HBase in production.

By: Amitanand Aiyer, Mikhail Bautin, Guoqiang Jerry Chen, Pritam Damania, Prakash Khemani, Kannan Muthukkaruppan, Karthik Ranganathan, Nicolas Spiegelberg, Liyin Tang, Madhuwanti Vaidya