All Research Areas
Research Areas
Year Published

102 Results

April 2, 2014

Libra: Divide and Conquer to Verify Forwarding Tables in Huge Networks

USENIX Symposium on Networked Systems Design and Implementation (NSDI)

Data center networks often have errors in the forwarding tables, causing packets to loop indefinitely, fall into black-holes or simply get dropped before they reach the correct destination. Finding fo…

By: James Hongyi Zeng, Shidong Zhang, Fei Ye, Vimal Kumar, Mickey Ju, Junda Liu, Nick McKeown, Amin Vahdat
February 17, 2014

Analysis of HDFS Under HBase: A Facebook Messages Case Study

USENIX Conference on File Storage Technologies (FAST)

We present a multilayer study of the Facebook Messages stack, which is based on HBase and HDFS. We collect and analyze HDFS traces to identify potential improvements, which we then evaluate via simulation.

By: Tyler Harter, Dhruba Borthakur, Siying Dong, Amitanand Aiyer, Liyin Tang, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
January 22, 2014

The Essence of Reynolds

ACM Symposium on Principles of Programming Languages (POPL)

John Reynolds (1935-2013) was a pioneer of programming languages research. In this paper we pay tribute to the man, his ideas, and his influence.

By: Stephen Brookes, Peter O'Hearn, Uday S. Reddy
November 4, 2013

An Analysis of Facebook Photo Caching

ACM Symposium on Operating Systems Principles (SOSP)

This paper examines the workload of Facebook’s photo-serving stack and the effectiveness of the many layers of caching it employs. Facebook’s image-management infrastructure is complex and geographically distributed. It includes browser caches on end-user systems, Edge Caches at ~20 PoPs, an Origin Cache, and for some kinds of images, additional caching via Akamai. The underlying image storage layer is widely distributed, and includes multiple data centers.

By: Qi Huang, Ken Birman, Robbert van Renesse, Wyatt Lloyd, Sanjeev Kumar, Harry Li
October 1, 2013

Virtual Network Diagnosis as a Service

ACM Symposium on Cloud Computing (SoCC)

Today’s cloud network platforms allow tenants to construct sophisticated virtual network topologies among their VMs on a shared physical network infrastructure. However, these platforms provide little…

By: Wenfei Wu, Guohui Wang, Aditya Akella, Anees Shaikh
August 27, 2013

Scuba: Diving into Data at Facebook

International Conference on Very Large Data Bases (VLDB)

Facebook takes performance monitoring seriously. Performance issues can impact over one billion users so we track thousands of servers, hundreds of PB of daily network traffic, hundreds of daily code…

By: Lior Abraham, John Allen, Oleksandr Barykin, Vinayak Borkar, Bhuwan Chopra, Ciprian Gerea, Dan Merl, Josh Metzler, David Reiss, Subbu Subramanian, Janet Wiener, Okay Zed
August 26, 2013

Unicorn: A System for Searching the Social Graph

International Conference on Very Large Data Bases (VLDB)

Unicorn is an online, in-memory social graph-aware indexing system designed to search trillions of edges between tens of billions of users and entities on thousands of commodity servers. Unicorn is based on standard concepts in information retrieval, but it includes features to promote results with good social proximity. It also supports queries that require multiple round-trips to leaves in order to retrieve objects that are more than one edge away from source nodes.

By: Mike Curtiss, Iain Becker, Tudor Bosman, Sergey Doroshenko, Lucian Adrian Grijincu, Tom Jackson, Soren Lassen, Philip Pronin, Guanghao Shen, Gintaras Woss, Chao Yang, Ning Zhang, Sriram Sankar
August 26, 2013

XORing Elephants: Novel Erasure Codes for Big Data

International Conference on Very Large Data Bases (VLDB)

Distributed storage systems for large clusters typically use replication to provide reliability. Recently, erasure codes have been used to reduce the large storage overhead of three-replicated systems. Reed-Solomon codes are the standard design choice and their high repair cost is often considered an unavoidable price to pay for high storage efficiency and high reliability.

By: Maheshwaran Sathiamoorthy, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis, Ramkumar Vadali, Scott Chen, Dhruba Borthakur
August 11, 2013

Speeding up Large-Scale Learning with a Social Prior

ACM Conference on Knowledge Discovery and Data Mining (KDD)

Slow convergence and poor initial accuracy are two problems that plague efforts to use very large feature sets in online learning. This is especially true when only a few features are ‘active’ in any…

By: Deepayan Chakrabarti, Ralf Herbrich
July 23, 2013

Semantic Hashing using Tags and Topic Modeling

ACM Special Interest Group on Information Retrieval Conference (SIGIR)

It is an important research problem to design efficient and effective solutions for large scale similarity search. One popular strategy is to represent data examples as compact binary codes through semantic hashing, which has produced promising results with fast search speed and low storage cost. Many existing semantic hashing methods generate binary codes for documents by modeling document relationships based on similarity in a keyword feature space. Two major limitations in existing methods are: (1) Tag information is often associated with documents in many real world applications, but has not been fully exploited yet; (2) The similarity in keyword feature space does not fully reflect semantic relationships that go beyond keyword matching.

By: Qifan Wang, Dan Zhang, Luo Si