Research Area
Year Published

1012 Results

February 6, 2013

Balanced Label Propagation for Partitioning Massive Graphs

ACM International Conference on Web Search and Data Mining (WSDM)

Partitioning graphs at scale is a key challenge for any application that involves distributing a graph across disks, machines, or data centers. Graph partitioning is a very well studied problem with a…

By: Johan Ugander, Lars Backstrom

February 5, 2013

Yahtzee: An Anonymized Group Level Matching Procedure

PLOS One

Researchers often face the problem of needing to protect the privacy of subjects while also needing to integrate data that contains personal information from diverse data sources. The advent of comput…

By: Jason J. Jones, Robert Bond, Christopher J. Fariss, Jaime Settle, Adam D. I. Kramer, Cameron Marlow

January 2, 2013

Inferring Tie Strength from Online Directed Behavior

PLOS One

Some social connections are stronger than others. People have not only friends, but also best friends. Social scientists have long recognized this characteristic of social connections and researchers frequently use the term ‘tie strength’ to refer to this concept. We used online interaction data (specifically, Facebook interactions) to successfully identify real-world strong ties.

By: Jason J. Jones, Jaime Settle, Robert Bond, Christopher J. Fariss, Cameron Marlow

October 24, 2012

The HipHop Compiler for PHP

ACM International Conference on Object Oriented Programming Systems, Languages, and Applications (OOPSLA)

Scripting languages are widely used to quickly accomplish a variety of tasks because of the high productivity they enable. Among other reasons, this increased productivity results from a combination o…

By: Haiping Zhao, Minghui Yang, Xin Qi, Mark Williams, Charlie Gao, Guilherme Ottoni, Drew Paroski, Scott MacVicar, Jason Evans, Stephen Tu

September 13, 2012

A 61-million-person experiment in social influence and political mobilization

Nature

Human behaviour is thought to spread through face-to-face social networks, but it is difficult to identify social influence effects in observational studies, and it is unknown whether online social ne…

By: Robert Bond, Christopher J. Fariss, Jason J. Jones, Adam D. I. Kramer, Cameron Marlow, Jaime Settle

August 16, 2012

Workload Analysis of a Large-Scale Key-Value Store

ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS)

Key-value stores are a vital component in many scale-out enterprises, including social networks, online retail, and risk analysis. Accordingly, they are receiving increased attention from the research community in an effort to improve their performance, scalability, reliability, cost, and power consumption. To be effective, such efforts require a detailed understanding of realistic key-value workloads. And yet little is known about these workloads outside of the companies that operate them. This paper aims to address this gap.

By: Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, Michael Paleczny

August 13, 2012

DeTail: Reducing the Flow Completion Time Tail in Datacenter Networks

ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM)

Web applications have now become so sophisticated that rendering a typical page may require hundreds of intra-datacenter flows. At the same time, web sites must meet strict page creation deadlines of 200-300ms to satisfy user demands for interactivity. Long-tailed flow completion times make it challenging for web sites to meet these constraints. They are forced to choose between rendering a subset of the complex page, or delay its rendering, thus missing deadlines and sacrificing either quality or responsiveness. Either option leads to potential financial loss.

By: Dhruba Borthakur, Randy Katz, Prashanth Mohan, Tathagata Das, David Zats

August 12, 2012

Active Sampling for Entity Matching

ACM Conference on Knowledge Discovery and Data Mining (KDD)

In entity matching, a fundamental issue while training a classifier to label pairs of entities as either duplicates or non-duplicates is the one of selecting informative examples. Although active learning presents an attractive solution to this problem, previous approaches minimize the misclassification rate (0-1 loss) of the classifier, which is an unsuitable metric for entity matching due to class imbalance (i.e., many more non-duplicate pairs than duplicate pairs).

By: Kedar Bellare, Suresh Iyengar Parthasarathy, Aditya Parameswaran, Vibhor Rastogi

June 22, 2012

Four Degrees of Separation

ACM Web Science Conference (WebSci)

Frigyes Karinthy, in his 1929 short story “Lancszemek” (in English, “Chains”) suggested that any two persons are distanced by at most six friendship links. Stanley Milgram in his famous experiments challenged people to route postcards to a fixed recipient by passing them only through direct acquaintances. Milgram found that the average number of intermediaries on the path of the postcards lay between 4.4 and 5.7, depending on the sample of people chosen.

By: Lars Backstrom, Paolo Boldi, Marco Rosa, Johan Ugander, Sebastiano Vigna

June 17, 2012

Storage Infrastructure Behind Facebook Messages: Using HBase at Scale

IEEE International Conference on Data Engineering (ICDE)

Facebook Messages, which combines messages, chat and email into a real-time conversation, is the first application in Facebook to use HBase in production.

By: Amitanand Aiyer, Mikhail Bautin, Guoqiang Jerry Chen, Pritam Damania, Prakash Khemani, Kannan Muthukkaruppan, Karthik Ranganathan, Nicolas Spiegelberg, Liyin Tang, Madhuwanti Vaidya