March 14, 2015

The Diffusion of Support in an Online Social Movement: Evidence from the Adoption of Equal-Sign Profile Pictures

Proc. CSCW'15

In March of 2013, 3 million Facebook users changed their profile picture to one of an equals sign to express support of same-sex marriage. We demonstrate that this action shows complex diffusion characteristics congruent with threshold models, with most users observing several of their friends changing their profile picture before taking the action themselves.

Bogdan State, Lada Adamic
February 17, 2015

What Makes for Effective Detection Proposals?


An in depth study of object proposals and their effect on object detection performance.

Bernt Schiele, Jan Hosang, Piotr Dollar, Rodrigo Benenson
February 10, 2015

Moving Fast with Software Verification

NASA Formal Method Symposium

For organisations like Facebook, high quality software is important. However, the pace of change and increasing complexity of modern code makes it difficult to produce error free software. Available tools are often lacking in helping programmers develop more reliable and secure applications.

Cristiano Calcagno, Dino Distefano, Jeremy Dubreil, Dominik Gabi, Pieter Hooimeijer, Martino Luca, Peter O'Hearn, Irene Papakonstantinou, Jim Purbrick, Dulma Churchill
December 19, 2014

Predicting the Quality of New Contributors to the Facebook Crowdsourcing System

Neural Information Processing Systems: Crowdsourcing and Machine Learning Workshop

We are interested in improving the quality and coverage of a knowledge graph through crowdsourcing features built into a social networking service. This work presents an approach to model user trust when prior history is lacking.

Julian Eisenschlos
December 19, 2014

Video (language) Modeling: a Baseline for Generative Models of Natural Videos

ArXiv PrePrint

In this work, we investigate models of natural high-resolution video sequences. We show that very simple models borrowed by language modeling applications are surprisingly effective at recovering shor…

Arthur Szlam, Joan Bruna, Marc'Aurelio Ranzato, Michael Mathieu, Ronan Collobert, Sumit Chopra
December 19, 2014

Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews

ICLR workshop 2015

In this work we ensemble several models and achieve state of the art accuracy for predicting the sentiment of movie reviews in the IMDB dataset.

Gregoire Mesnil, Tomas Mikolov, Marc'Aurelio Ranzato, Yoshua Bengio
December 4, 2014

Extracting Translation Pairs from Social Network Content

International Workshop on Spoken Language Translation

We describe two methods to collect translation pairs from public Facebook content. We use the extracted translation pairs as additional training data for machine translation systems and we can show significant improvements.

Matthias Eck, Yury Zemlyanskiy, Joy Zhang, Alex Waibel
October 27, 2014

Characterizing Load Imbalance in Real-World Networked Caches

HotNets 2014: Thirteenth ACM Workshop on Hot Topics in Networks

Modern Web services rely extensively upon a tier of in-memory caches to reduce request latencies and alleviate load on backend servers. Within a given cache, items are typically partitioned across cache servers via consistent hashing, with the goal of balancing the number of items maintained by each cache server. Effects of consistent hashing vary by associated hashing function and partitioning ratio. Most real-world workloads are also skewed, with some items significantly more popular than others. Inefficiency in addressing both issues can create an imbalance in cache-server loads. We analyze the degree of observed load imbalance, focusing on read-only traffic against Facebook’s graph cache tier in Tao. We investigate the principal causes of load imbalance, including data co-location, non-ideal hashing scenarios, and hot-spot temporal effects. We also employ trace-drive analytics to study the benefits and limitations of current load-balancing methods, suggesting areas for future research.

Qi Huang, Helga Gudmundsdottir, Ymir Vigfusson, Daniel A. Freedman, Ken Birman, Robbert van Renesse
October 24, 2014

The HipHop Virtual Machine

ACM International Conference on Object Oriented Programming Systems, Languages, and Applications

The HipHop Virtual Machine (HHVM) is a JIT compiler and runtime for PHP. While PHP values are dynamically typed, real programs often have latent types that are useful for optimization once discovered….

Keith Adams, Jason Evans, Bertrand Maher, Guilherme Ottoni, Drew Paroski, Brett Simmers, Edwin Smith, Owen Yamauchi
October 7, 2014

f4: Facebook’s Warm BLOB Storage System

Operating Systems Design and Implementation

Facebook’s corpus of photos, videos, and other Binary Large OBjects (BLOBs) that need to be reliably stored and quickly accessible is massive and continues to grow.

Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin, Weiwen Liu, Satadru Pan, Shiva Shankar, Viswanath Sivakumar, Linpeng Tang, Sanjeev Kumar
October 6, 2014

The Mystery Machine: End-to-end Performance Analysis of Large-scale Internet Services

Operating Systems Design and Implementation

Current debugging and optimization methods scale poorly to deal with the complexity of modern Internet services, in which a single request triggers parallel execution of numerous heterogeneous softwar…

Mike Chow, David Meisner, Jason Flinn, Daniel Peek, Thomas Wenisch
September 18, 2014

Hierarchical Cascade of Classifiers for Efficient Poselet Evaluation

British Machine Vision Conference

Poselets have been used in a variety of computer vision tasks, such as detection, segmentation, action classification, pose estimation and action recognition, often achieving state-of-the-art performa…

David Bo Chen, Pietro Perona, Lubomir Bourdev
September 18, 2014

Mining Energy Traces to Aid in Software Development: An Empirical Case Study

ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

With the advent of increased computing on mobile devices such as phones and tablets, it has become crucial to pay attention to the energy consumption of mobile applications.

Ashish Gupta, Thomas Zimmermann, Christian Bird, Nachiappan Nagappan, Thirumalesh Bhat, Syed Emran
September 4, 2014

Question Answering with Subgraph Embeddings

Empirical Methods in Natural Language Processing

This paper presents a system which learns to answer questions on a broad range of topics from a knowledge base using few handcrafted features. Our model learns low-dimensional embeddings of words and knowledge base constituents; these representations are used to score natural language questions against candidate answers.

Antoine Bordes, Jason Weston, Sumit Chopra
September 4, 2014

#TagSpace: Semantic Embeddings from Hashtags

Empirical Methods in Natural Language Processing

We describe a convolutional neural network that learns feature representations for short textual posts using hashtags as a supervised signal. The proposed approach is trained on up to 5.5 billion words predicting 100,000 possible hashtags.

Jason Weston, Sumit Chopra, Keith Adams
September 1, 2014

Optimal Crowd-Powered Rating and Filtering Algorithms

VLDB 2014

We focus on crowd-powered filtering, i.e., filtering a large set of items using humans. Filtering is one of the most commonly used building blocks in crowdsourcing applications and systems. While solu…

Aditya Parameswaran, Stephen Boyd, Hector Garcia-Molina, Ashish Gupta, Neoklis Polyzotis, Jennifer Widom
August 24, 2014

Streamed Approximate Counting of Distinct Elements

ACM Conference on Knowledge Discovery and Data Mining (KDD)

Counting the number of distinct elements in a large dataset is a common task in web applications and databases. This problem is difficult in limited memory settings where storing a large hash table ta…

Daniel Ting
August 24, 2014

Practical Lessons from Predicting Clicks on Ads at Facebook

International Workshop on Data Mining for Online Advertising (ADKDD)

Online advertising allows advertisers to only bid and pay for measurable user responses, such as clicks on ads. As a consequence, click prediction systems are central to most online advertising system…

He Xinran, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, Joaquin Quinonero Candela
August 18, 2014

A Hitchhiker’s Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers

ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM)

Erasure codes such as Reed-Solomon (RS) codes are being extensively deployed in data centers since they offer significantly higher reliability than data replication methods at much lower storage overheads. These codes however mandate much higher resources with respect to network bandwidth and disk IO during reconstruction of data that is missing or otherwise unavailable.

K.V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, Kannan Ramchandran
August 18, 2014

Fastpass: A Centralized “Zero-Queue” Datacenter Network

ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM)

Current datacenter networks inherit the principles that went into the design of the Internet, where packet transmission and path selection decisions are distributed among the endpoints and routers. In…

Jonathan Perry, Amy Ousterhout, Hari Balakrishnan, Devavrat Shah, Hans Fugal