November 2, 2016

DQBarge: Improving Data-Quality Tradeoffs in Large-Scale Internet Services

OSDI 2016

DQBarge is a system that enables better data-quality tradeoffs by propagating critical information along the causal path of request processing.

Jason Flinn, Kaushik Veeraraghavan, Michael Cafarella, Michael Chow
November 2, 2016

Kraken: Leveraging Live Traffic Tests to Identify and Resolve Resource Utilization Bottlenecks in Large Scale Web Services

OSDI 2016

Kraken is a new system that runs load tests by continually shifting live user traffic to one or more data centers.

Kaushik Veeraraghavan, Justin Meza, David Chou, Wonho Kim, Sonia Margulis, Scott Michelson, Rajesh Nishtala, Daniel Obenshain, Dmitri Perelman, Yee Jiun Song
November 1, 2016

Neural Text Generation from Structured Data with Application to the Biography Domain

Empirical Methods in Natural Language Processing (EMNLP)

This paper introduces a neural model for concept-to-text generation that scales to large, rich domains.

Remi Lebret, David Grangier, Michael Auli
October 31, 2016

Online Social Integration is Associated with Reduced Mortality Risk


We reference 12 million social media profiles against California Department of Public Health vital records and use longitudinal statistical models to assess whether social media use is associated with longer life.

Moira Burke, William Hobbs, James H. Fowler, Nicholas Christakis
October 28, 2016

Bilingual Methods for Adaptive Training Data Selection for Machine Translation

Association for the Machine Translation in Americas

We propose a new data selection method which uses semi-supervised convolutional neural networks based on bitokens (Bi-SSCNNs) for training machine translation systems from a large bilingual

Boxing Chen, Roland Kuhn, George Foster, Colin Cherry, Fei Huang
October 10, 2016

Revisiting Visual Question Answering Baselines

European Conference on Computer Vision 2016

This paper questions the value of common practices and develops a simple alternative model based on binary classification.

Allan Jabri, Armand Joulin, Laurens van der Maaten
October 10, 2016

Polysemous Codes

European Conference on Computer Vision 2016 (ECCV)

This paper considers the problem of approximate nearest neighbor search in the compressed domain.

Matthijs Douze, Hervé Jégou, Florent Perronnin
October 10, 2016

Learning to Refine Object Segments

European Conference on Computer Vision

In this work we propose to augment feedforward nets for object segmentation with a novel top-down refinement approach.

Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, Piotr Dollar
October 8, 2016

Learning Visual Features from Large Weakly Supervised Data

European Conference on Computer Vision

In this paper, we explore the potential of leveraging massive, weakly-labeled image collections for learning good visual features.

Allan Jabri, Armand Joulin, Laurens van der Maaten, Nicolas Vasilache
September 18, 2016

A MultiPath Network for Object Detection


We test three modifications to the standard Fast R-CNN object detector to determine if they can overcome the object detection challenges in a COCO object detection dataset.

Sergey Zagoruyko, Adam Lerer, Tsung-Yi Lin, Pedro O. Pinheiro, Sam Gross, Soumith Chintala, Piotr Dollar
September 11, 2016

EyeContact: Scleral Coil Eye Tracking for Virtual Reality

International Symposium on Wearable Computers 2016

We investigate a wearable high temporal and spatial resolution eye tracking system based on magnetic tracking and using scleral search coils (SSC).

Robert Cavin, Brian Scally, David Perek, Eric Whitmire, Shwetak Patel, Jim Phillips
September 8, 2016

Joint Learning of Speaker and Phonetic Similarities with Siamese Networks

Interspeech 2016

We scale up the feasibility of jointly learning specialized speaker and phone embeddings architectures to the 360 hours of the Librispeech corpus by implementing a sampling method to efficiently select pairs of words from the dataset and improving the loss function.

Neil Zeghidour, Gabriel Synnaeve, Nicolas Usunier, Emmanuel Dupoux
September 5, 2016

Cubrick: Indexing Millions of Records per Second for Interactive Analytics

VLDB 2016

This paper describes the architecture and design of Cubrick, a distributed multidimensional in-memory DBMS suited for interactive analytics over highly dynamic datasets.

Pedro Eugenio Rocha Pedreira, Chris Croswhite, Luis Bona
September 1, 2016

Desugaring Haskell’s Do-Notation into Applicative Operations

ACM SIGPLAN Haskell Sympoisum

In this paper we show how to re-use the very same do-notation to work for Applicatives as well, providing efficiency benefits for some types that are both Monad and Applicative, and syntactic convenience for those that are merely Applicative.

Andrey Mokhov, Edward Kmett, Simon Marlow, Simon Peyton Jones
August 23, 2016

Robotron: Top-down Network Management at Facebook Scale


In this paper, we present Robotron, a system for managing a massive production network in a top-down fashion.

Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y. Wong, James Hongyi Zeng
August 16, 2016

Synergy of Monotonic Rules


This article describes a method for constructing a special rule (we call it synergy rule) that uses as its input information the outputs (scores) of several monotonic rules which solve the same pattern recognition problem.

Vladimir Vapnik, Rauf Izmailov
August 13, 2016

Compressing Graphs and Indexes with Recursive Graph Bisection


Graph reordering is a powerful technique to increase the locality of the representations of graphs, which can be helpful in several applications. We study how the technique can be used to improve compression of graphs and inverted indexes.

Laxman Dhulipala, Igor Kabiljo, Brian Karrer, Giuseppe Ottaviano, Sergey Pupyrev, Alon Shalita
August 12, 2016

Towards Optimal Cardinality Estimation of Unions and Intersections with Sketches

ACM Conference on Knowledge Discovery and Data Mining

This paper presents and analyzes two new classes of methods for estimating cardinalities of intersections and unions from sketches.

Daniel Ting
August 11, 2016

Semi-Supervised Convolutional Networks for Translation Adaptation with Tiny Amount of In-domain Data

Conference on Natural Language Learning

We propose a method which uses semi-supervised convolutional neural networks (CNNs) to select in-domain training data for statistical machine translation.

Boxing Chen, Fei Huang
August 10, 2016

Neural Network-Based Word Alignment through Score Aggregation

Association for Computational Linguistics Conference on Machine Translation

We present a simple neural network for word alignment that builds source and target word window representations to compute alignment scores for sentence pairs.

Joel Legrand, Michael Auli, Ronan Collobert