May 2, 2016

Facebook AI Research at ICLR 2016

By: Ari Entin

This week at the International Conference on Learning Representations (ICLR), members from Facebook AI Research are presenting 12 Conference Track papers and 3 Workshop Track papers.

At the conference, the artificial intelligence and machine learning communities will gather to discuss how to best learn meaningful and useful representations of data to application areas such as vision, speech, audio and natural language processing. Topics at this conference include deep learning and feature learning, metric learning, kernel learning, compositional models, non-linear structured prediction, and issues regarding non-convex optimization.

Descriptions explaining some of this research and a complete list of papers are below. Additional information can be found here.

Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks

Authors: J. Weston, A. Bordes, S. Chopra, A. M. Rush, B. van Merriënboer, A.
Joulin, T. Mikolov.
Overview: We propose a set of 20 toy question answering tasks that involve reading comprehension over short stories. To perform well, machine learning techniques with reasoning and memory are required. Our tasks are hence one way to measure progress towards natural language understanding, and we hope they will help encourage the development of new methods for that goal.
[data & code]

Deep Multi-Scale Video Prediction Beyond Mean Square Error

Authors: M. Mathieu, C. Couprie, Y. LeCun
Overview: Learning to predict future images from a video sequence involves the construction of an internal representation that models the image evolution accurately, and therefore, to some degree, its content and dynamics. In this work, we train a convolutional network to generate future frames given an input sequence. To deal with the inherently blurry predictions obtained from the standard Mean Squared Error (MSE) loss function, we propose three different and complementary feature learning strategies: a multi-scale architecture, an adversarial training method, and an image gradient difference loss function.

Unifying Distillation and Privileged Information

Authors: D. Lopez-Paz, L. Bottou, B. Schölkopf, V. Vapnik
Overview: Humans learn much faster than machines. Our work tackles this question by allowing machines to learn not only from data, but also from other intelligent machines. To this end, our work unifies two seminal ideas: Hinton’s distillation and Vapnik’s privileged information. We complement our ideas with some numerical simulations, and with future research directions relating to semi-supervised learning, multitask learning, Universum learning, and curriculum learning.

Predicting distributions with Linearizing Belief Networks

Authors: Y. Dauphin, D. Grangier
Overview: Conditional belief networks introduce stochastic binary variables in neural networks. Contrary to a classical neural network, a belief network can predict more than the expected value of the output Y given the input X. It can predict a distribution of outputs $Y$ which is useful when an input can admit multiple outputs whose average is not necessarily a valid answer. Such networks are particularly relevant to inverse problem such as image prediction for de-noising, or text to speech. However, traditional sigmoid belief networks are hard to train and are not suited to continuous problems. This work introduces a new family of networks called linearizing belief nets or LBNs for continuous problems. This model trains efficiently and improves the state-of-the-art on image de-noising and facial expression generation with the Toronto faces dataset.

Sequence Level Training with Recurrent Neural Networks

Authors: M. Ranzato, S. Chopra, M. Auli, W. Zaremba
Overview: This paper presents a new algorithm to train models generating text, useful in applications like machine translation and summarization. While current algorithms merely aim at predicting the next word in a given sequence, our approach predicts the whole sentence and it optimizes the metric used at test time. Our experiments show consistent improvements over three different tasks: machine translation, summarization and image captioning.

Metric Learning with Adaptive Density Discrimination

Authors: O. Rippel, M. Paluri, P. Dollar, L. Bourdev
Overview: Sculpting visual representations is a challenging problem. In this work we propose Magnet loss which is a new loss function that uses Distance Metric Learning to sculpt the visual space locally. We show that by using Magnet loss we can not only train faster but also improve classification accuracy on many standard benchmarks and more importantly show that the visual space can expose hidden attributes of the data that were not part of the labels.

Alternative structures for character-level RNNs

Authors: P. Bojanowski, A. Joulin, T. Mikolov
Overview: When recurrent neural networks (RNN) are applied on the level of characters instead of words, the hidden representation needs to be large in order to successfully model long-term dependencies. This in turn implies higher computational costs, which can become prohibitive in practice. We propose two alternative structural modifications to the classical RNN model in order to cope with these issues.

The Goldilocks Principle: Reading Children’s Books with Explicit Memory Representations

Authors: F.Hill, A. Bordes, S. Chopra, J. Weston
Overview: Our artificial intelligence team has done research where they’ve trained a computer to predict missing words in children’s stories — something called “The Children’s Book Test”.
Historically, computers have been able to predict simple words like “on” or “at” and verbs like “run” or “eat”, but they don’t do as well at predicting nouns like “ball”, “table” or people’s names. For this research, we trained the computer to look at the context of a sentence and much more accurately predict those more difficult words — nouns and names — which are often the most important parts of sentences. The computer’s predictions were most accurate when it looked at just the right amount of context around relevant words — not too much and not too little. We call this “The Goldilocks Principle”.


Stacked What-Where Auto-encoders
Jake Zhao, Michael Mathieu, Ross Goroshin, Yann LeCun

Universum Prescription: Regularization using Unlabeled Data
Xiang Zhang, Yann LeCun

Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems
Jesse Dodge, Andreea Gane, Xiang Zhang, Antoine Bordes, Sumit Chopra,Alexander Miller, Arthur Szlam, Jason Weston

Particular object retrieval with integral max-pooling of CNN activations
Giorgos Tolias, Ronan Sicre, Hervé Jegou

Super-resolution with deep convolutional sufficient statistics
Joan Bruna Estrach, Pablo Sprechmann, Yann LeCun

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Alec Radford, Luke Metz, Soumith Chintala

Better Computer Go Player with Neural Network and Long-term Prediction
Yuandong Tian, Yan Zhu