June 23, 2013

Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached

ACM/IEEE International Symposium on Computer Architecture (ISCA)

Distributed in-memory key-value stores, such as memcached, are central to the scalability of modern internet services. Current deployments use commodity servers with high-end processors. However, give…

Kevin Lim, David Meisner, Ali Saidi, Parthasarathy Ranganathan, Thomas Wenisch
June 22, 2013

LinkBench: a Database Benchmark based on the Facebook Social Graph

ACM Special Interest Group on Management of Data (SIGMOD/PODS)

Database benchmarks are an important tool for database researchers and practitioners that ease the process of making informed comparisons between different database hardware, software and configuratio…

Tim Armstrong, Nagavamsi Ponnekanti, Dhruba Borthakur, Mark Callaghan
June 17, 2013

MILEAGE: Multiple Instance LEArning with Global Embedding

International Conference on Machine Learning (ICML)

Multiple Instance Learning (MIL) generally represents each example as a collection of instances such that the features for local objects can be better captured, whereas traditional methods typically extract a global feature vector for each example as an integral part. However, there is limited research work on investigating which of the two learning scenarios performs better.

Dan Zhang, Jingrui He, Luo Si, Richard Lawrence
June 7, 2013

Representing Documents Through Their Readers

ACM Conference on Knowledge Discovery and Data Mining (KDD)

From Twitter to Facebook to Reddit, users have become accustomed to sharing the articles they read with friends or followers on their social networks. While previous work has modeled what these shared stories say about the user who shares them, the converse question remains unexplored: what can we learn about an article from the identities of its likely readers?

Khalid El-Arini, Min Xu, Emily Fox, Carlos Guestrin
June 5, 2013

Speeding up Large-Scale Learning with a Social Prior

ACM Conference on Knowledge Discovery and Data Mining (KDD

Slow convergence and poor initial accuracy are two problems that plague efforts to use very large feature sets in online learning. This is especially true when only a few features are ‘active’ in any…

Deepayan Chakrabarti, Ralf Herbrich
May 20, 2013

Machine Learning Paradigms for Speech Recognition: An Overview

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Automatic Speech Recognition (ASR) has historically been a driving force behind many machine learning (ML) techniques, including the ubiquitously used hidden Markov model, discriminative learning, Bay…

Li Deng, Xiao Li
May 17, 2013

Latent Credibility Analysis

International World Wide Web Conference (WWW)

A frequent problem when dealing with data gathered from multiple sources on the web (ranging from booksellers to Wikipedia pages to stock analyst predictions) is that these sources disagree, and we must decide which of their (often mutually exclusive) claims we should accept. Current state-of-the-art information credibility algorithms known as ‘fact-finders’ are transitive voting systems with rules specifying how votes iteratively flow from sources to claims and then back to sources.

Jeff Pasternack, Dan Roth
May 13, 2013

Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large Graph Collections

International World Wide Web Conference (WWW)

A growing set of on-line applications are generating data that can be viewed as very large collections of small, dense social graphs – these range from sets of social groups, events, or collaboration projects to the vast collection of graph neighborhoods in large social networks.

Johan Ugander, Lars Backstrom, Jon Kleinberg
May 8, 2013

CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks

International World Wide Web Conference (WWW)

In this paper we focus on the social network Facebook and the problem of discerning ill-gotten Page Likes, made by spammers hoping to turn a profit, from legitimate Page Likes. Our method, which we refer to as CopyCatch, detects lockstep Page Like patterns on Facebook by analyzing only the social graph between users and Pages and the times at which the edges in the graph (the Likes) were created.

Alex Beutel, Tom Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos
May 7, 2013

Facebook’s Data Center Network Architecture

IEEE Optical Interconnects Conference (OI)

We review Facebook’s current data center network architecture and explore some alternative architectures.

Nathan Farrington, Alexey Andreyev
May 1, 2013

Hash Bit Selection: a Unified Solution for Selection Problems in Hashing

Conference on Computer Vision and Pattern Recognition (CVPR)

Hashing based methods recently have been shown promising for large-scale nearest neighbor search. However, good designs involve difficult decisions of many unknowns – data features, hashing algorithms, parameter settings, kernels, etc.

Xianglong Liu, Junfeng He, Bo Lang, Shih-Fu Chang
April 29, 2013

Quantifying the Invisible Audience in Social Networks

ACM Conference on Human Factors in Computing Systems (CHI)

When you share content in an online social network, who is listening? Users have scarce information about who actually sees their content, making their audience seem invisible and difficult to estimate. However, understanding this invisible audience can impact both science and design, since perceived audiences influence content production and self-presentation online.

Michael Bernstein, Eytan Bakshy, Moira Burke, Brian Karrer
April 27, 2013

Gender, Topic, and Audience Response: An Analysis of User-Generated Content on Facebook

ACM Conference on Human Factors in Computing Systems (CHI)

Although users generate a large volume of text on Facebook every day, we know little about the topics they choose to talk about, and how their network responds. Using Latent Dirichlet Allocation (LDA)…

Yi-Chia Wang, Moira Burke, Robert Kraut
April 26, 2013

Storage and Performance Optimization of Long Tail Key Access in a Social Network

International Workshop on Cloud Data and Platforms (Cloud DP)

In a social network, it is natural to have hot objects such as a celebrity’s Facebook page. Duplicating hot object data in each cluster provides quick cache access and avoids stressing a single server’s network or CPU resources. But duplicating cold data in each cache cluster consumes significant RAM. A more storage efficient way is to separate hot data from cold data and duplicate only hot data in each cache cluster within a data center. The cold data, or the long tail data, which is accessed much less frequently, has only one copy at a regional cache cluster.

John Liang, Yang 'James' Luo, Mark Drayton, Rajesh Nishtala, Richard Liu, Nick Hammer, Jason Taylor, Bill Jia
April 3, 2013

Scaling Memcache at Facebook

USENIX Symposium on Networked Systems Design and Implementation (NSDI)

Memcached is a well known, simple, in memory caching solution. This paper describes how Facebook leverages memcached as a building block to construct and scale a distributed key-value store that supports the world’s largest social network.

Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry Li, Ryan McElroy, Michael Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, Venkat Venkataramani
April 1, 2013

Using Facebook after Losing a Job: Differential Benefits of Strong and Weak Ties

ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW)

Among those who have recently lost a job, social networks in general and online ones in particular may be useful to cope with stress and find new employment. This study focuses on the psychological an…

Moira Burke, Robert Kraut
February 6, 2013

Arrival and Departure Dynamics in Social Networks

ACM International Conference on Web Search and Data Mining (WSDM)

In this paper, we consider the natural arrival and departure of users in a social network, and ask whether the dynamics of arrival, which have been studied in some depth, also explain the dynamics of departure, which are not as well studied.

Shaomei Wu, Atish Das Sarma, Alex Fabrikant, Silvio Lattanzi, Andrew Tomkins
February 6, 2013

Balanced Label Propagation for Partitioning Massive Graphs

ACM International Conference on Web Search and Data Mining (WSDM)

Partitioning graphs at scale is a key challenge for any application that involves distributing a graph across disks, machines, or data centers. Graph partitioning is a very well studied problem with a…

Johan Ugander, Lars Backstrom
February 6, 2013

Characterizing and Curating Conversation Threads: Expansion, Focus, Volume, Re-entry

ACM International Conference on Web Search and Data Mining (WSDM)

Discussion threads form a central part of the experience on many Web sites, including social networking sites such as Facebook and Google Plus and knowledge creation sites such as Wikipedia.

Lars Backstrom, Jon Kleinberg, Lillian Lee, Cristian Danescu-Niculescu-Mizil
February 5, 2013

Yahtzee: An Anonymized Group Level Matching Procedure


Researchers often face the problem of needing to protect the privacy of subjects while also needing to integrate data that contains personal information from diverse data sources. The advent of comput…

Jason J. Jones, Robert Bond, Christopher J. Fariss, Jaime Settle, Adam D. I. Kramer, Cameron Marlow