October 4, 2015

Existential Consistency: Measuring and Understanding Consistency at Facebook

The 25th ACM Symposium on Operating Systems Principles

Replicated storage for large Web services faces a trade-off between stronger forms of consistency and higher performance properties. Stronger consistency prevents anomalies, i.e., unexpected behavior visible to users, and reduces programming complexity.

By: Haonan Lu, Kaushik Veeraraghavan, Philippe Ajoux, Jim Hunt, Yee Jiun Song, Wendy Tobagus, Sanjeev Kumar, Wyatt Lloyd
September 17, 2015

Improved Arabic Dialect Classification with Social Media Data

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

Arabic dialect classification has been an important and challenging problem for Arabic language processing, especially for social media text analysis and machine translation. In this paper we propose an approach to improving Arabic dialect classification with semi-supervised learning: multiple classifiers are trained with weakly supervised, strongly supervised, and unsupervised data. Their combination yields significant and consistent improvement on two different test sets.

By: Fei Huang
August 31, 2015

Cubrick: A Scalable Distributed MOLAP Database for Fast Analytics

41st International Conference on Very Large Databases (Ph.D Workshop)

This paper describes the architecture and design of Cubrick, a distributed multidimensional in-memory database that enables real-time data analysis of large dynamic datasets. Cubrick has a strictly multidimensional data model composed of dimensions, dimensional hierarchies and metrics, supporting sub-second MOLAP operations such as slice and dice, roll-up and drill-down over terabytes of data.

By: Pedro Pedreira, Luis Erpen de Bona, Chris Croswhite
August 31, 2015

One Trillion Edges: Graph Processing at Facebook-Scale

The 41st International Conference on Very Large Data Bases

Analyzing large graphs provides valuable insights for social networking and web companies in content ranking and recommendations. While numerous graph processing systems have been developed and evaluated on available benchmark graphs of up to 6.6B edges, they often face significant difficulties in scaling to much larger graphs. Industry graphs can be two orders of magnitude larger hundreds of billions or up to one trillion edges.

By: Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, Sambavi Muthukrishnan
August 17, 2015

Inside the Social Network’s (Datacenter) Network


Large cloud service providers have invested in increasingly larger datacenters to house the computing infrastructure required to support their services.

By: Arjun Roy, James Hongyi Zeng, Jasmeet Bagga, George Porter, Alex C. Snoeren
August 13, 2015

From Categorical Logic to Facebook Engineering

30th Annual ACM/IEEE Symposium on Logic in Computer Science

I chart a line of development from category-theoretic models of programs and logics to automatic program verification/analysis techniques that are in deployment at Facebook. Our journey takes in a number of concepts from the computer science logician’s toolkit – including categorical logic and model theory, denotational semantics, the Curry-Howard isomorphism, substructural logic, Hoare Logic and Separation Logic, abstract interpretation, compositional program analysis, the frame problem, and abductive inference.

By: Peter O'Hearn
August 12, 2015

No Regret Bound for Extreme Bandits

ArXiv PrePrint

A sensible notion of regret in the extreme bandit setting

By: Robert Nishihara, David Lopez-Paz, Leon Bottou
June 25, 2015

Scale-Invariant Learning and Convolutional Networks

ArXiv PrePrint

The conventional classification schemes — notably multinomial logistic regression — used in conjunction with convolutional networks (convnets) are classical in statistics, designed without consideration for the usual coupling with convnets, stochastic gradient descent, and backpropagation. In the specific application to supervised learning for convnets, a simple scale-invariant classification stage turns out to be more robust than multinomial logistic regression, appears to result in slightly lower errors on several standard test sets, has similar computational costs, and features precise control over the actual rate of learning.

By: Mark Tygert, Arthur Szlam, Soumith Chintala, Marc'Aurelio Ranzato, Yuandong Tian, Wojciech Zaremba
June 22, 2015

Learning Spatiotemporal Features with 3D Convolutional Networks

ArXiv PrePrint

We propose C3D, a simple and effective approach for spatiotemporal feature using deep 3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised video dataset.

By: Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri
June 22, 2015

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

ArXiv PrePrint

We introduce a generative parametric model capable of producing high quality samples of natural images

By: Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus