August 13, 2012

DeTail: Reducing the Flow Completion Time Tail in Datacenter Networks

ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM)

Web applications have now become so sophisticated that rendering a typical page may require hundreds of intra-datacenter flows. At the same time, web sites must meet strict page creation deadlines of 200-300ms to satisfy user demands for interactivity. Long-tailed flow completion times make it challenging for web sites to meet these constraints. They are forced to choose between rendering a subset of the complex page, or delay its rendering, thus missing deadlines and sacrificing either quality or responsiveness. Either option leads to potential financial loss.

Dhruba Borthakur, Randy Katz, Prashanth Mohan, Tathagata Das, David Zats
August 12, 2012

Active Sampling for Entity Matching

ACM Conference on Knowledge Discovery and Data Mining (KDD)

In entity matching, a fundamental issue while training a classifier to label pairs of entities as either duplicates or non-duplicates is the one of selecting informative examples. Although active learning presents an attractive solution to this problem, previous approaches minimize the misclassification rate (0-1 loss) of the classifier, which is an unsuitable metric for entity matching due to class imbalance (i.e., many more non-duplicate pairs than duplicate pairs).

Kedar Bellare, Suresh Iyengar Parthasarathy, Aditya Parameswaran, Vibhor Rastogi
June 22, 2012

Four Degrees of Separation

ACM Web Science Conference (WebSci)

Frigyes Karinthy, in his 1929 short story “Lancszemek” (in English, “Chains”) suggested that any two persons are distanced by at most six friendship links. Stanley Milgram in his famous experiments challenged people to route postcards to a fixed recipient by passing them only through direct acquaintances. Milgram found that the average number of intermediaries on the path of the postcards lay between 4.4 and 5.7, depending on the sample of people chosen.

Lars Backstrom, Paolo Boldi, Marco Rosa, Johan Ugander, Sebastiano Vigna
June 17, 2012

Storage Infrastructure Behind Facebook Messages: Using HBase at Scale

IEEE International Conference on Data Engineering (ICDE)

Facebook Messages, which combines messages, chat and email into a real-time conversation, is the first application in Facebook to use HBase in production.

Amitanand Aiyer, Mikhail Bautin, Guoqiang Jerry Chen, Pritam Damania, Prakash Khemani, Kannan Muthukkaruppan, Karthik Ranganathan, Nicolas Spiegelberg, Liyin Tang, Madhuwanti Vaidya
June 8, 2012

Social Influence in Social Advertising: Evidence from Field Experiments

ACM Conference on Electronic Commerce (EC)

Social advertising uses information about consumers’ peers, including peer affiliations with a brand, product, organization, etc., to target ads and contextualize their display. This approach can incr…

Eytan Bakshy, Dean Eckles, Rong Yan, Itamar Rosenn
June 1, 2012

Power and Performance Evaluation of Memcached on the TILEPro64 Architecture

Sustainable Computing: Informatics and Systems

Power consumption of data centers had become an important factor in the economy and sustainability of large-scale Web services. Researchers and practitioners are spending considerable effort to characterize Web-scale workloads and evaluating their applicability to alternative, more power-efficient architectures. One such workload in particular is the caching layer, which stores expensive-to-regenerate data in fast storage to reduce service times.

Mateusz Berezecki, Eitan Frachtenberg, Michael Paleczny, Ken Steele
May 16, 2012

The spread of emotion via Facebook

ACM Conference on Human Factors in Computing Systems (CHI)

In this paper we study large-scale emotional contagion through an examination of Facebook status updates. After a user makes a status update with emotional content, their friends are significantly more likely to make a valence-consistent post.

Adam D. I. Kramer
May 1, 2012

Thermal Design in the Open Compute Datacenter

IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITHERM)

The advent of Web-based services and cloud computing has instigated an explosive growth in demand for datacenters. Traditionally, Internet companies would lease datacenter space and servers from vendors that often emphasize flexibility over efficiency. But as these companies grew larger, they sought to reduce acquisition and operation costs by building their own datacenters.

Eitan Frachtenberg, Dan Lee, Marco Magarelli, Veerendra Mulay, Jay Park
April 25, 2012

PACMan: Coordinated Memory Caching for Parallel Jobs

USENIX Symposium on Networked Systems Design and Implementation (NSDI)

Data-intensive analytics on large clusters is important for modern Internet services. As machines in these clusters have large memories, in-memory caching of inputs is an effective way to speed up the…

Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker, Ion Stoica
April 17, 2012

Structural Diversity in Social Contagion

Proceedings of the National Academy of Sciences (PNAS)

The concept of contagion has steadily expanded from its original grounding in epidemic disease to describe a vast array of processes that spread across networks, notably social phenomena such as fads,…

Johan Ugander, Lars Backstrom, Cameron Marlow, Jon Kleinberg
April 16, 2012

The Role of Social Networks in Information Diffusion

International World Wide Web Conference (WWW)

Online social networking technologies enable individuals to simultaneously share information with any number of peers. Quantifying the causal effect of these mediums on the dissemination of informatio…

Eytan Bakshy, Itamar Rosenn, Cameron Marlow, Lada Adamic
March 1, 2012

Bootstrapping Data Arrays of Arbitrary Order

The Annals of Applied Statistics (AOAS)

In this paper we study a bootstrap strategy for estimating the variance of a mean taken over large multifactor crossed random effects data sets. We apply bootstrap reweighting independently to the lev…

Art B. Owen, Dean Eckles
March 1, 2012

Predicting Memcache Throughput using Simulation and Modeling

IEEE Symposium on Theory of Modeling and Simulation (TMS)

The current work introduces a method for predicting Memcached throughput on single-core and multi-core processors. The method is based on traces collected from a full system simulator running Memcached.

Steven Hart, Eitan Frachtenberg, Mateusz Berezecki
December 23, 2011

High-efficiency server design

ACM Conference on Supercomputing (ICS)

Large-scale datacenters consume megawatts in power and cost hundreds of millions of dollars to equip. Reducing the energy and cost footprint of servers can therefore have substantial impact.

Eitan Frachtenberg, Ali Heydari, Hu Li, Amir Michael, Jacob Na, Avery Nisbet, Pierluigi Sarti
December 1, 2011

Performance of an online translation tool when applied to patient educational material

Journal of Hospital Medicine

We evaluate the accuracy of state-of-the-art online machine translation systems for translating patient educational material.

Raman R. Khanna, Leah S. Karliner, Matthias Eck, Eric Vittinghoff, Christopher J. Koenig, Margaret C. Fang
August 15, 2011

Phonetic Classification Using Controlled Random Walks

Conference of the International Speech Communication Association (Interspeech)

Recently, semi-supervised learning algorithms for phonetic classifiers have been proposed that have obtained promising results. Often, these algorithms attempt to satisfy learning criteria that are not inherent in the standard generative or discriminative training procedures for phonetic classifiers.

Katrin Kirchhoff, Andrei Alexandrescu
July 24, 2011

Learning Relevance from a Heterogeneous Social Network and Its Application in Online Targeting

ACM Special Interest Group on Information Retrieval (SIGIR)

The rise of social networking services in recent years presents new research challenges for matching users with interesting content. While the content-rich nature of these social networks offers many…

Chi Wang, Rajat Raina, David Fong, Ding Zhou, Jiawei Han, Greg Badros
July 21, 2011

Dimensions of Self-Expression in Facebook Status Updates

AAAI International Conference on Weblogs and Social Media (ICWSM)

We describe the dimensions along which Facebook users tend to express themselves via status updates using the semi-automated text analysis approach, the Meaning Extraction Method (MEM).

Adam D. I. Kramer, Cindy K. Chung
July 7, 2011

Center of Attention: How Facebook Users Allocate Attention across Friends

AAAI International Conference on Weblogs and Social Media (ICWSM)

An individual’s personal network — their set of social contacts — is a basic object of study in sociology. Studies of personal networks have focused on their size (the number of contacts) and their composition (in terms of categories such as kin and co-workers). Here we propose a new measure for the analysis of personal networks, based on the way in which an individual divides his or her attention across contacts. This allows us to contrast people who focus a large fraction of their interactions on a small set of close friends with people who disperse their attention more widely.

Lars Backstrom, Eytan Bakshy, Jon Kleinberg, Thomas Lento, Itamar Rosenn
July 5, 2011

Location3: How Users Share and Respond to Location-Based Data on Social Networking Sites

AAAI International Conference on Weblogs and Social Media (ICWSM)

In August 2010 Facebook launched Places, a location-based service that allows users to check into points of interest and share their physical whereabouts with friends. The friends who see these events in their News Feed can then respond to these check-ins by liking or commenting on them.

Jonathan Chang, Eric Sun