Research Area
Year Published

631 Results

February 1, 2011

Social Capital on Facebook: Differentiating Uses and Users

ACM Conference on Human Factors in Computing Systems (CHI)

Though social network site use is often treated as a monolithic activity, in which all time is equally “social” and its impact the same for all users, we examine how Facebook affects social capital depending upon: (1) types of site activities, contrasting one-on-one communication, broadcasts to wider audiences, and passive consumption of social news, and (2) individual differences among users, including social communication skill and self-esteem.

By: Moira Burke, Robert Kraut, Cameron Marlow
January 1, 2011

RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems

IEEE International Conference on Data Engineering (ICDE)

MapReduce-based data warehouse systems are playing important roles of supporting big data analytics to understand quickly the dynamics of user behavior trends and their needs in typical Web service providers and social network sites (e.g., Facebook). In such a system, the data placement structure is a critical factor that can affect the warehouse performance in a fundamental way.

By: Yongqiang He, Rubao Lee, Yin Huai, Zheng Shao, Namit Jain, Xiaodong Zhang, Zhiwei Xu
January 1, 2011

Supervised Random Walks: Predicting and Recommending Links in Social Networks

ACM International Conference on Web Search and Data Mining (WSDM)

Predicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Although this problem has been extensively studied, the challenge of how to effectively combine the information from the network structure with rich node and edge attribute data remains largely open.

By: Lars Backstrom, Jure Leskovec
January 1, 2011

Network Bucket Testing

International World Wide Web Conference (WWW)

Bucket testing, also known as A/B testing, is a practice that is widely used by on-line sites with large audiences: in a simple version of the methodology, one evaluates a new feature on the site by e…

By: Lars Backstrom, Jure Leskovec
October 4, 2010

Finding a needle in Haystack: Facebook’s photo storage

USENIX Symposium on Operating Systems Design and Implementation (OSDI)

This paper describes Haystack, an object storage system optimized for Facebook’s Photos application. Facebook currently stores over 260 billion images, which translates to over 20 petabytes of data. U…

By: Doug Beaver, Sanjeev Kumar, Harry Li, Jason Sobel, Peter Vajgel
June 6, 2010

Data warehousing and analytics infrastructure at Facebook.

Special Interest Group on Management of Data (SIGMOD)

Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook – both engineering and non-engineering. Apart from ad hoc analysis of data and creation of business intelligence dashboards by analysts across the company, a number of Facebook’s site features are also based on analyzing large data sets.

By: Ashish Thusoo, Dhruba Borthakur, Raghotham Murthy, Zheng Shao, Namit Jain, Hao Liu, Suresh Antony, Joydeep Sen Sarma
June 1, 2010

Not-so-latent dirichlet allocation: collapsed Gibbs sampling using human judgments

Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)

Probabilistic topic models are a popular tool for the unsupervised analysis of text, providing both a predictive model of future text and a latent topic representation of the corpus. Recent studies have found that while there are suggestive connections between topic models and the way humans interpret data, these two often disagree.

By: Jonathan Chang
June 1, 2010

Tools for Collecting Speech Corpora via Mechanical Turk

NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk

To rapidly port speech applications to new languages one of the most difficult tasks is the initial collection of sufficient speech corpora.

By: Ian Lane, Alex Waibel, Matthias Eck, Kay Rottmann
April 26, 2010

Find Me If You Can: Improving Geographical Prediction with Social and Spatial Proximity

International World Wide Web Conference (WWW)

Geography and social relationships are inextricably intertwined; the people we interact with on a daily basis almost always live near us. As people spend more time online, data regarding these two dimensions — geography and social relationships — are becoming increasingly precise, allowing us to build reliable models to describe their interaction. These models have important implications in the design of location-based services, security intrusion detection, and social media supporting local communities.

By: Lars Backstrom, Eric Sun, Cameron Marlow
April 19, 2010

ePluribus: Ethnicity on Social Networks

AAAI CONFERENCE ON WEBLOGS AND SOCIAL MEDIA (ICWSM)

We propose an approach to determine the ethnic break-down of a population based solely on people’s names and data provided by the U.S. Census Bureau. We demonstrate that our approach is able to predict the ethnicities of individuals as well as the ethnicity of an entire population better than natural alternatives.

By: Jonathan Chang, Itamar Rosenn, Lars Backstrom, Cameron Marlow