July 9, 2013

Calling All Facebook Friends: Exploring requests for help on Facebook

AAAI Conference on Weblogs and Social Media (ICWSM)

Past research suggests Facebook use is linked to perceptions of social capital, a concept that taps into the resources people gain from interactions with their social network. In this study, we examin…

Nicole Ellison, Rebecca Gray, Jessica Vitak, Cliff Lampe, Andrew Tresolini Fiore
July 8, 2013

The Anatomy of Large Facebook Cascades

AAAI Conference on Weblogs and Social Media (ICWSM)

When users post photos on Facebook, they have the option of allowing their friends, followers, or anyone at all to subsequently reshare the photo. A portion of the billions of photos posted to Facebook generates cascades of reshares, enabling many additional users to see, like, comment, and reshare the photos.

Alex Dow, Lada Adamic, Adrien Friggeri
July 8, 2013

Families on Facebook

AAAI Conference on Weblogs and Social Media (ICWSM)

This descriptive study of millions of US Facebook users documents “friending” and communication patterns, exploring parent-child relationships across a variety of life stages and gender combinations.

Moira Burke, Lada Adamic, Karyn Marciniak
July 2, 2013

Self-censorship on Facebook

AAAI Conference on Weblogs and Social Media (ICWSM)

We report results from an exploratory analysis examining “last-minute” self-censorship, or content that is filtered after being written, on Facebook. We collected data from 3.9 mil-lion users over 17 days and associate self-censorship behavior with features describing users, their social graph, and the interactions between them.AAAI Conference on Weblogs and Social Media (ICWSM)

Sauvik Das, Adam D. I. Kramer
July 1, 2013

Development and Deployment at Facebook

IEEE Internet Computing 17(4)

More than one billion users log in to Facebook at least once a month to connect and share content with each other. Among other activities, these users upload over 2.5 billion content items every day.

Dror Feitelson, Eitan Frachtenberg, Kent Beck
June 26, 2013

TAO: Facebook’s Distributed Data Store for the Social Graph

USENIX Annual Technical Conference (ATC)

We introduce a simple data model and API tailored for serving the social graph, and TAO, an implementation of this model. TAO is a geographically distributed data store that provides efficient and timely access to the social graph for Facebook’s demanding workload using a fixed set of queries. It is deployed at Facebook, replacing memcache for many data types that fit its model.

Nathan Bronson, Zachary Amsden, George Cabrera III, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov, Dmitri Petrov, Lovro Puzar, Yee Jiun Song, Venkat Venkataramani
June 26, 2013

A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster

USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage

Erasure codes, such as Reed-Solomon (RS) codes, are being increasingly employed in data centers to combat the cost of reliably storing large amounts of data. Although these codes provide optimal stora…

K.V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, Kannan Ramchandran
June 23, 2013

Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached

ACM/IEEE International Symposium on Computer Architecture (ISCA)

Distributed in-memory key-value stores, such as memcached, are central to the scalability of modern internet services. Current deployments use commodity servers with high-end processors. However, give…

Kevin Lim, David Meisner, Ali Saidi, Parthasarathy Ranganathan, Thomas Wenisch
June 22, 2013

LinkBench: a Database Benchmark based on the Facebook Social Graph

ACM Special Interest Group on Management of Data (SIGMOD/PODS)

Database benchmarks are an important tool for database researchers and practitioners that ease the process of making informed comparisons between different database hardware, software and configuratio…

Tim Armstrong, Nagavamsi Ponnekanti, Dhruba Borthakur, Mark Callaghan
June 17, 2013

MILEAGE: Multiple Instance LEArning with Global Embedding

International Conference on Machine Learning (ICML)

Multiple Instance Learning (MIL) generally represents each example as a collection of instances such that the features for local objects can be better captured, whereas traditional methods typically extract a global feature vector for each example as an integral part. However, there is limited research work on investigating which of the two learning scenarios performs better.

Dan Zhang, Jingrui He, Luo Si, Richard Lawrence
June 7, 2013

Representing Documents Through Their Readers

ACM Conference on Knowledge Discovery and Data Mining (KDD)

From Twitter to Facebook to Reddit, users have become accustomed to sharing the articles they read with friends or followers on their social networks. While previous work has modeled what these shared stories say about the user who shares them, the converse question remains unexplored: what can we learn about an article from the identities of its likely readers?

Khalid El-Arini, Min Xu, Emily Fox, Carlos Guestrin
June 5, 2013

Speeding up Large-Scale Learning with a Social Prior

ACM Conference on Knowledge Discovery and Data Mining (KDD

Slow convergence and poor initial accuracy are two problems that plague efforts to use very large feature sets in online learning. This is especially true when only a few features are ‘active’ in any…

Deepayan Chakrabarti, Ralf Herbrich
May 20, 2013

Machine Learning Paradigms for Speech Recognition: An Overview

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Automatic Speech Recognition (ASR) has historically been a driving force behind many machine learning (ML) techniques, including the ubiquitously used hidden Markov model, discriminative learning, Bay…

Li Deng, Xiao Li
May 17, 2013

Latent Credibility Analysis

International World Wide Web Conference (WWW)

A frequent problem when dealing with data gathered from multiple sources on the web (ranging from booksellers to Wikipedia pages to stock analyst predictions) is that these sources disagree, and we must decide which of their (often mutually exclusive) claims we should accept. Current state-of-the-art information credibility algorithms known as ‘fact-finders’ are transitive voting systems with rules specifying how votes iteratively flow from sources to claims and then back to sources.

Jeff Pasternack, Dan Roth
May 13, 2013

Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large Graph Collections

International World Wide Web Conference (WWW)

A growing set of on-line applications are generating data that can be viewed as very large collections of small, dense social graphs – these range from sets of social groups, events, or collaboration projects to the vast collection of graph neighborhoods in large social networks.

Johan Ugander, Lars Backstrom, Jon Kleinberg
May 8, 2013

CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks

International World Wide Web Conference (WWW)

In this paper we focus on the social network Facebook and the problem of discerning ill-gotten Page Likes, made by spammers hoping to turn a profit, from legitimate Page Likes. Our method, which we refer to as CopyCatch, detects lockstep Page Like patterns on Facebook by analyzing only the social graph between users and Pages and the times at which the edges in the graph (the Likes) were created.

Alex Beutel, Tom Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos
May 7, 2013

Facebook’s Data Center Network Architecture

IEEE Optical Interconnects Conference (OI)

We review Facebook’s current data center network architecture and explore some alternative architectures.

Nathan Farrington, Alexey Andreyev
May 1, 2013

Hash Bit Selection: a Unified Solution for Selection Problems in Hashing

Conference on Computer Vision and Pattern Recognition (CVPR)

Hashing based methods recently have been shown promising for large-scale nearest neighbor search. However, good designs involve difficult decisions of many unknowns – data features, hashing algorithms, parameter settings, kernels, etc.

Xianglong Liu, Junfeng He, Bo Lang, Shih-Fu Chang
April 29, 2013

Quantifying the Invisible Audience in Social Networks

ACM Conference on Human Factors in Computing Systems (CHI)

When you share content in an online social network, who is listening? Users have scarce information about who actually sees their content, making their audience seem invisible and difficult to estimate. However, understanding this invisible audience can impact both science and design, since perceived audiences influence content production and self-presentation online.

Michael Bernstein, Eytan Bakshy, Moira Burke, Brian Karrer
April 27, 2013

Gender, Topic, and Audience Response: An Analysis of User-Generated Content on Facebook

ACM Conference on Human Factors in Computing Systems (CHI)

Although users generate a large volume of text on Facebook every day, we know little about the topics they choose to talk about, and how their network responds. Using Latent Dirichlet Allocation (LDA)…

Yi-Chia Wang, Moira Burke, Robert Kraut