This week at the International Conference on Learning Representations (ICLR), members from Facebook AI Research are presenting 12 Conference Track papers and 3 Workshop Track papers.
At the conference, the artificial intelligence and machine learning communities will gather to discuss how to best learn meaningful and useful representations of data to application areas such as vision, speech, audio and natural language processing. Topics at this conference include deep learning and feature learning, metric learning, kernel learning, compositional models, non-linear structured prediction, and issues regarding non-convex optimization.
Descriptions explaining some of this research and a complete list of papers are below.
Authors: J. Weston, A. Bordes, S. Chopra, A. M. Rush, B. van Merriënboer, A.
Joulin, T. Mikolov.
Overview: We propose a set of 20 toy question answering tasks that involve reading comprehension over short stories. To perform well, machine learning techniques with reasoning and memory are required. Our tasks are hence one way to measure progress towards natural language understanding, and we hope they will help encourage the development of new methods for that goal.
[data & code]
Authors: M. Mathieu, C. Couprie, Y. LeCun
Overview: Learning to predict future images from a video sequence involves the construction of an internal representation that models the image evolution accurately, and therefore, to some degree, its content and dynamics. In this work, we train a convolutional network to generate future frames given an input sequence. To deal with the inherently blurry predictions obtained from the standard Mean Squared Error (MSE) loss function, we propose three different and complementary feature learning strategies: a multi-scale architecture, an adversarial training method, and an image gradient difference loss function.
Authors: D. Lopez-Paz, L. Bottou, B. Schölkopf, V. Vapnik
Overview: Humans learn much faster than machines. Our work tackles this question by allowing machines to learn not only from data, but also from other intelligent machines. To this end, our work unifies two seminal ideas: Hinton’s distillation and Vapnik’s privileged information. We complement our ideas with some numerical simulations, and with future research directions relating to semi-supervised learning, multitask learning, Universum learning, and curriculum learning.
Authors: Y. Dauphin, D. Grangier
Overview: Conditional belief networks introduce stochastic binary variables in neural networks. Contrary to a classical neural network, a belief network can predict more than the expected value of the output Y given the input X. It can predict a distribution of outputs $Y$ which is useful when an input can admit multiple outputs whose average is not necessarily a valid answer. Such networks are particularly relevant to inverse problem such as image prediction for de-noising, or text to speech. However, traditional sigmoid belief networks are hard to train and are not suited to continuous problems. This work introduces a new family of networks called linearizing belief nets or LBNs for continuous problems. This model trains efficiently and improves the state-of-the-art on image de-noising and facial expression generation with the Toronto faces dataset.
Authors: M. Ranzato, S. Chopra, M. Auli, W. Zaremba
Overview: This paper presents a new algorithm to train models generating text, useful in applications like machine translation and summarization. While current algorithms merely aim at predicting the next word in a given sequence, our approach predicts the whole sentence and it optimizes the metric used at test time. Our experiments show consistent improvements over three different tasks: machine translation, summarization and image captioning.
Authors: O. Rippel, M. Paluri, P. Dollar, L. Bourdev
Overview: Sculpting visual representations is a challenging problem. In this work we propose Magnet loss which is a new loss function that uses Distance Metric Learning to sculpt the visual space locally. We show that by using Magnet loss we can not only train faster but also improve classification accuracy on many standard benchmarks and more importantly show that the visual space can expose hidden attributes of the data that were not part of the labels.
Authors: P. Bojanowski, A. Joulin, T. Mikolov
Overview: When recurrent neural networks (RNN) are applied on the level of characters instead of words, the hidden representation needs to be large in order to successfully model long-term dependencies. This in turn implies higher computational costs, which can become prohibitive in practice. We propose two alternative structural modifications to the classical RNN model in order to cope with these issues.
Authors: F.Hill, A. Bordes, S. Chopra, J. Weston
Overview: Our artificial intelligence team has done research where they’ve trained a computer to predict missing words in children’s stories — something called “The Children’s Book Test”.
Historically, computers have been able to predict simple words like “on” or “at” and verbs like “run” or “eat”, but they don’t do as well at predicting nouns like “ball”, “table” or people’s names. For this research, we trained the computer to look at the context of a sentence and much more accurately predict those more difficult words — nouns and names — which are often the most important parts of sentences. The computer’s predictions were most accurate when it looked at just the right amount of context around relevant words — not too much and not too little. We call this “The Goldilocks Principle”.
Particular object retrieval with integral max-pooling of CNN activations
Giorgos Tolias, Ronan Sicre, Hervé Jegou