Filter by Research Area
Filter by Research Area
Year Published

116 Results

August 26, 2013

Unicorn: A System for Searching the Social Graph

International Conference on Very Large Data Bases (VLDB)

Unicorn is an online, in-memory social graph-aware indexing system designed to search trillions of edges between tens of billions of users and entities on thousands of commodity servers. Unicorn is based on standard concepts in information retrieval, but it includes features to promote results with good social proximity. It also supports queries that require multiple round-trips to leaves in order to retrieve objects that are more than one edge away from source nodes.

By: Mike Curtiss, Iain Becker, Tudor Bosman, Sergey Doroshenko, Lucian Adrian Grijincu, Tom Jackson, Sandhya Kunnatur, Soren Lassen, Philip Pronin, Sriram Sankar, Guanghao Shen, Gintaras Woss, Chao Yang, Ning Zhang
August 26, 2013

XORing Elephants: Novel Erasure Codes for Big Data

International Conference on Very Large Data Bases (VLDB)

Distributed storage systems for large clusters typically use replication to provide reliability. Recently, erasure codes have been used to reduce the large storage overhead of three-replicated systems. Reed-Solomon codes are the standard design choice and their high repair cost is often considered an unavoidable price to pay for high storage efficiency and high reliability.

By: Maheshwaran Sathiamoorthy, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis, Ramkumar Vadali, Scott Chen, Dhruba Borthakur
August 11, 2013

Speeding up Large-Scale Learning with a Social Prior

ACM Conference on Knowledge Discovery and Data Mining (KDD)

Slow convergence and poor initial accuracy are two problems that plague efforts to use very large feature sets in online learning. This is especially true when only a few features are ‘active’ in any…

By: Deepayan Chakrabarti, Ralf Herbrich
July 23, 2013

Semantic Hashing using Tags and Topic Modeling

ACM Special Interest Group on Information Retrieval Conference (SIGIR)

It is an important research problem to design efficient and effective solutions for large scale similarity search. One popular strategy is to represent data examples as compact binary codes through semantic hashing, which has produced promising results with fast search speed and low storage cost. Many existing semantic hashing methods generate binary codes for documents by modeling document relationships based on similarity in a keyword feature space. Two major limitations in existing methods are: (1) Tag information is often associated with documents in many real world applications, but has not been fully exploited yet; (2) The similarity in keyword feature space does not fully reflect semantic relationships that go beyond keyword matching.

By: Qifan Wang, Dan Zhang, Luo Si
July 14, 2013

TAO: Facebook’s Distributed Data Store for the Social Graph

USENIX Annual Technical Conference 2013

We introduce a simple data model and API tailored for serving the social graph, and TAO, an implementation of this model.

By: Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov, Dmitri Petrov, Lovro Puzar, Yee Jiun Song, Venkat Venkataramani
July 1, 2013

Development and Deployment at Facebook

IEEE Internet Computing 17(4)

More than one billion users log in to Facebook at least once a month to connect and share content with each other. Among other activities, these users upload over 2.5 billion content items every day.

By: Dror Feitelson, Eitan Frachtenberg, Kent Beck
June 26, 2013

A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster

USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage)

Erasure codes, such as Reed-Solomon (RS) codes, are being increasingly employed in data centers to combat the cost of reliably storing large amounts of data. Although these codes provide optimal stora…

By: K.V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, Kannan Ramchandran
June 26, 2013

TAO: Facebook’s Distributed Data Store for the Social Graph

USENIX Annual Technical Conference (ATC)

We introduce a simple data model and API tailored for serving the social graph, and TAO, an implementation of this model. TAO is a geographically distributed data store that provides efficient and timely access to the social graph for Facebook’s demanding workload using a fixed set of queries. It is deployed at Facebook, replacing memcache for many data types that fit its model.

By: Nathan Bronson, Zachary Amsden, George Cabrera III, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov, Dmitri Petrov, Lovro Puzar, Yee Jiun Song, Venkat Venkataramani
June 23, 2013

Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached

ACM/IEEE International Symposium on Computer Architecture (ISCA)

Distributed in-memory key-value stores, such as memcached, are central to the scalability of modern internet services. Current deployments use commodity servers with high-end processors. However, give…

By: Kevin Lim, David Meisner, Ali Saidi, Parthasarathy Ranganathan, Thomas Wenisch
June 22, 2013

LinkBench: a Database Benchmark based on the Facebook Social Graph

ACM Special Interest Group on Management of Data (SIGMOD/PODS)

Database benchmarks are an important tool for database researchers and practitioners that ease the process of making informed comparisons between different database hardware, software and configuratio…

By: Timothy Armstrong, Nagavamsi Ponnekanti, Dhruba Borthakur, Mark Callaghan