July 5, 2011

Location3: How Users Share and Respond to Location-Based Data on Social Networking Sites

AAAI International Conference on Weblogs and Social Media (ICWSM)

In August 2010 Facebook launched Places, a location-based service that allows users to check into points of interest and share their physical whereabouts with friends. The friends who see these events in their News Feed can then respond to these check-ins by liking or commenting on them.

Jonathan Chang, Eric Sun
June 20, 2011

YSmart: Yet Another SQL-to-MapReduce Translator

International Conference on Distributed Computing Systems (ICDCS)

MapReduce has become an effective approach to big data analytics in large cluster systems, where SQL-like queries play important roles to interface between users and systems. However, based on our Face book daily operation results, certain types of queries are executed at an unacceptable low speed by Hive (a production SQL-to-MapReduce translator). In this paper, we demonstrate that existing SQL-to-MapReduce translators that operate in a one-operation-to-one-job mode and do not consider query correlations cannot generate high-performance MapReduce programs for certain queries, due to the mismatch between complex SQL structures and simple MapReduce framework. We propose and develop a system called Y Smart, a correlation aware SQL-to-MapReduce translator. Y Smart applies a set of rules to use the minimal number of MapReduce jobs to execute multiple correlated operations in a complex query. Y Smart can significantly reduce redundant computations, I/O operations and network transfers compared to existing translators. We have implemented Y Smart with intensive evaluation for complex queries on two Amazon EC2 clusters and one Face book production cluster. The results show that Y Smart can outperform Hive and Pig, two widely used SQL-to-MapReduce translators, by more than four times for query execution.

Rubao Lee, Tian Luo, Yin Huai, Fusheng Wang, Yongqiang He, Xiaodong Zhang
February 1, 2011

Social Capital on Facebook: Differentiating Uses and Users

ACM Conference on Human Factors in Computing Systems (CHI)

Though social network site use is often treated as a monolithic activity, in which all time is equally “social” and its impact the same for all users, we examine how Facebook affects social capital depending upon: (1) types of site activities, contrasting one-on-one communication, broadcasts to wider audiences, and passive consumption of social news, and (2) individual differences among users, including social communication skill and self-esteem.

Moira Burke, Robert Kraut, Cameron Marlow
January 1, 2011

Supervised Random Walks: Predicting and Recommending Links in Social Networks

ACM International Conference on Web Search and Data Mining (WSDM)

Predicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Although this problem has been extensively studied, the challenge of how to effectively combine the information from the network structure with rich node and edge attribute data remains largely open.

Lars Backstrom, Jure Leskovec
January 1, 2011

Network Bucket Testing

International World Wide Web Conference (WWW)

Bucket testing, also known as A/B testing, is a practice that is widely used by on-line sites with large audiences: in a simple version of the methodology, one evaluates a new feature on the site by e…

Lars Backstrom, Jure Leskovec
June 1, 2010

Not-so-latent dirichlet allocation: collapsed Gibbs sampling using human judgments

Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)

Probabilistic topic models are a popular tool for the unsupervised analysis of text, providing both a predictive model of future text and a latent topic representation of the corpus. Recent studies have found that while there are suggestive connections between topic models and the way humans interpret data, these two often disagree.

Jonathan Chang
April 26, 2010

Find Me If You Can: Improving Geographical Prediction with Social and Spatial Proximity

International World Wide Web Conference (WWW)

Geography and social relationships are inextricably intertwined; the people we interact with on a daily basis almost always live near us. As people spend more time online, data regarding these two dimensions — geography and social relationships — are becoming increasingly precise, allowing us to build reliable models to describe their interaction. These models have important implications in the design of location-based services, security intrusion detection, and social media supporting local communities.

Lars Backstrom, Eric Sun, Cameron Marlow
April 19, 2010

ePluribus: Ethnicity on Social Networks

AAAI CONFERENCE ON WEBLOGS AND SOCIAL MEDIA (ICWSM)

We propose an approach to determine the ethnic break-down of a population based solely on people’s names and data provided by the U.S. Census Bureau. We demonstrate that our approach is able to predict the ethnicities of individuals as well as the ethnicity of an entire population better than natural alternatives.

Jonathan Chang, Itamar Rosenn, Lars Backstrom, Cameron Marlow
February 1, 2010

Social Network Activity and Social Well-Being

ACM Conference on Human Factors in Computing Systems (CHI)

Previous research has shown a relationship between use of social networking sites and feelings of social capital. However, most studies have relied on self-reports by college students. The goals of the current study are to (1) validate the common self-report scale using empirical data from Facebook, (2) test whether previous findings generalize to older and international populations, and (3) delve into the specific activities linked to feelings of social capital and loneliness.

Moira Burke, Cameron Marlow, Thomas Lento
February 1, 2010

An Unobtrusive Behavioral Model of “Gross National Happiness”

ACM Conference on Human Factors in Computing Systems (CHI)

I analyze the use of emotion words for approximately 100 million Facebook users since September of 2007. “Gross national happiness” is operationalized as a standardized difference between the use of p…

Adam D. I. Kramer
June 1, 2009

Feed Me: Motivating Newcomer Contribution in Social Network Sites

ACM Conference on Human Factors in Computing Systems

Social networking sites (SNS) are only as good as the content their users share. Therefore, designers of SNS seek to improve the overall user experience by encouraging members to contribute more content. However, user motivations for contribution in SNS are not well understood. This is particularly true for newcomers, who may not recognize the value of contribution. Using server log data from approximately 140,000 newcomers in Facebook, we predict long-term sharing based on the experiences the newcomers have in their first two weeks. We test four mechanisms: social learning, singling out, feedback, and distribution.

Moira Burke, Cameron Marlow, Thomas Lento
April 1, 2009

Gesundheit! Modeling Contagion through Facebook News Feed

AAAI Conference on Weblogs and Social Media

Whether they are modeling bookmarking behavior in Flickr or cascades of failure in large networks, models of diffusion often start with the assumption that a few nodes start long chain reactions, resulting in large-scale cascades.

Eric Sun, Itamar Rosenn, Cameron Marlow, Thomas Lento