Machine learning and computational neuroscience experts are gathering in Montréal from Sunday December 2nd through Saturday December 8th for the Thirty-second Annual Conference on Neural Information Processing Systems (NeurIPS). Research from Facebook will be presented in oral paper and poster sessions. Facebook researchers and engineers will also be organizing and participating in workshops throughout the week.
For those unable to attend, the conference will be livestreamed on the NeurIPS Facebook page from December 3rd through December 6th.
Facebook research being presented at NeurIPS 2018
Learning to capture long-range relations is fundamental to image/video recognition. Existing CNN models generally rely on increasing depth to model such relations which is highly inefficient. In this work, we propose the “double attention block”, a novel component that aggregates and propagates informative global features from the entire spatio-temporal space of input images/videos, enabling subsequent convolution layers to access features from the entire space efficiently. The component is designed with a double attention mechanism in two steps, where the first step gathers features from the entire space into a compact set through second-order attention pooling and the second step adaptively selects and distributes features to each location via another attention. The proposed double attention block is easy to adopt and can be plugged into existing deep neural networks conveniently. We conduct extensive ablation studies and experiments on both image and video recognition tasks for evaluating its performance. On the image recognition task, a ResNet-50 equipped with our double attention blocks outperforms a much larger ResNet-152 architecture on ImageNet-1k dataset with over 40% less the number of parameters and less FLOPs. On the action recognition task, our proposed model achieves the state-of-the-art results on the Kinetics and UCF-101 datasets with significantly higher efficiency than recent works.
A Block Coordinate Ascent Algorithm for Mean-Variance Optimization
Tengyang Xie, Bo Liu, Yangyang Xu, Mohammad Ghavamzadeh, Yinlam Chow, Daoming Lyu and Daesub Yoon
Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare. The mean-variance function is one of the most widely used objective functions in risk management due to its simplicity and interpretability. Existing algorithms for mean-variance optimization are based on multi-time-scale stochastic approximation, whose learning rate schedules are often hard to tune, and have only asymptotic convergence proof. In this paper, we develop a model-free policy search framework for mean-variance optimization with finite-sample error bound analysis (to local optima). Our starting point is a reformulation of the original mean-variance function with its Legendre-Fenchel dual, from which we propose a stochastic block coordinate ascent policy search algorithm. Both the asymptotic convergence guarantee of the last iteration’s solution and the convergence rate of the randomly picked solution are provided, and their applicability is demonstrated on several benchmark domains.
A Lyapunov-based Approach to Safe Reinforcement Learning
Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman and Mohammad Ghavamzadeh
In many real-world reinforcement learning (RL) problems, besides optimizing the main objective function, an agent must concurrently avoid violating a number of constraints. In particular, besides optimizing performance, it is crucial to guarantee the safety of an agent during training as well as deployment (e.g., a robot should avoid taking actions – exploratory or not – which irrevocably harm its hardware). To incorporate safety in RL, we derive algorithms under the framework of constrained Markov decision processes (CMDPs), an extension of the standard Markov decision processes (MDPs) augmented with constraints on expected cumulative costs. Our approach hinges on a novel Lyapunov method. We define and present a method for constructing Lyapunov functions, which provide an effective way to guarantee the global safety of a behavior policy during training via a set of local linear constraints. Leveraging these theoretical underpinnings, we show how to use the Lyapunov approach to systematically transform dynamic programming (DP) and RL algorithms into their safe counterparts. To illustrate their effectiveness, we evaluate these algorithms in several CMDP planning and decision-making tasks on a safety benchmark domain. Our results show that our proposed method significantly outperforms existing baselines in balancing constraint satisfaction and performance.
Solomonoff’s general theory of inference (Solomonoff, 1964) and the Minimum Description Length principle (Grünwald, 2007; Rissanen, 2007) formalize Occam’s razor, and hold that a good model of data is a model that is good at losslessly compressing the data, including the cost of describing the model itself. Deep neural networks might seem to go against this principle given the large number of parameters to be encoded. We demonstrate experimentally the ability of deep neural networks to compress the training data even when accounting for parameter encoding. The compression viewpoint originally motivated the use of variational methods in neural networks (Hinton and Van Camp, 1993; Schmidhuber, 1997). Unexpectedly, we found that these variational methods provide surprisingly poor compression bounds, despite being explicitly built to minimize such bounds. This might explain the relatively poor practical performance of variational methods in deep learning. On the other hand, simple incremental encoding methods yield excellent compression values on deep networks, vindicating Solomonoff’s approach.
Thomas George, César Laurent, Xavier Bouthillier, Nicolas Ballas and Pascal Vincent
Optimization algorithms that leverage gradient covariance information, such as variants of natural gradient descent (Amari, 1998), offer the prospect of yielding more effective descent directions. For models with many parameters, the covariance matrix they are based on becomes gigantic, making them inapplicable in their original form. This has motivated research into both simple diagonal approximations and more sophisticated factored approximations such as KFAC (Heskes, 2000; Martens & Grosse, 2015; Grosse & Martens, 2016). In the present work we draw inspiration from both to propose a novel approximation that is provably better than KFAC and amendable to cheap partial updates. It consists in tracking a diagonal variance, not in parameter coordinates, but in a Kronecker-factored eigenbasis, in which the diagonal approximation is likely to be more effective. Experiments show improvements over KFAC in optimization speed for several deep network architectures.
Fighting Boredom in Recommender Systems with Linear Reinforcement Learning
Romain Warlop, Alessandro Lazaric and Jérémie Mary
A common assumption in recommender systems (RS) is the existence of a best fixed recommendation strategy. Such strategy may be simple and work at the item level (e.g., in multi-armed bandit it is assumed one best fixed arm/item exists) or implement more sophisticated RS (e.g., the objective of A/B testing is to find the best fixed RS and execute it thereafter). We argue that this assumption is rarely verified in practice, as the recommendation process itself may impact the user’s preferences. For instance, a user may get bored by a strategy, while she may gain interest again, if enough time passed since the last time that strategy was used. In this case, a better approach consists in alternating different solutions at the right frequency to fully exploit their potential. In this paper, we first cast the problem as a Markov decision process, where the rewards are a linear function of the recent history of actions, and we show that a policy considering the long-term influence of the recommendations may outperform both fixed-action and contextual greedy policies. We then introduce an extension of the UCRL algorithm (LINUCRL) to effectively balance exploration and exploitation in an unknown environment, and we derive a regret bound that is independent of the number of states. Finally, we empirically validate the model assumptions and the algorithm in a number of realistic scenarios.
We formulate the problem of defogging as state estimation and future state prediction from previous, partial observations in the context of real-time strategy games. We propose to employ encoder-decoder neural networks for this task, and introduce proxy tasks and baselines for evaluation to assess their ability of capturing basic game rules and high-level dynamics. By combining convolutional neural networks and recurrent networks, we exploit spatial and sequential correlations and train well-performing models on a large dataset of human games of StarCraft®: Brood War®†. Finally, we demonstrate the relevance of our models to downstream tasks by applying them for enemy unit prediction in a state-of-the-art, rule-based StarCraft bot. We observe improvements in win rates against several strong community bots.
Modern deep transfer learning approaches have mainly focused on learning generic feature vectors from one task that are transferable to other tasks, such as word embeddings in language and pretrained convolutional features in vision. However, these approaches usually transfer unary features and largely ignore more structured graphical representations. This work explores the possibility of learning generic latent relational graphs that capture dependencies between pairs of data units (e.g., words or pixels) from large-scale unlabeled data and transferring the graphs to downstream tasks. Our proposed transfer learning framework improves performance on various tasks including question answering, natural language inference, sentiment analysis, and image classification. We also show that the learned graphs are generic enough to be transferred to different embeddings on which the graphs have not been trained (including GloVe embeddings, ELMo embeddings, and task-specific RNN hidden units), or embedding-free units such as image pixels.
Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes
Ronan Fruit, Matteo Pirotta and Alessandro Lazaric
While designing the state space of an MDP, it is common to include states that are transient or not reachable by any policy (e.g., in mountain car, the product space of speed and position contains configurations that are not physically reachable). This results in weakly-communicating or multi-chain MDPs. In this paper, we introduce TUCRL, the first algorithm able to perform efficient exploration-exploitation in any finite Markov Decision Process (MDP) without requiring any form of prior knowledge. In particular, for any MDP with SCcommunicating states, A actions and ΓC ≤ SC possible communicating next states, we derive a Õ(DC √ ΓCSCAT) regret bound, where DC is the diameter (i.e., the length of the longest shortest path between any two states) of the communicating part of the MDP. This is in contrast with existing optimistic algorithms (e.g., UCRL, Optimistic PSRL) that suffer linear regret in weakly-communicating MDPs, as well as posterior sampling or regularised algorithms (e.g., REGAL), which require prior knowledge on the bias span of the optimal policy to achieve sub-linear regret. We also prove that in weakly-communicating MDPs, no algorithm can ever achieve a logarithmic growth of the regret without first suffering a linear regret for a number of steps that is exponential in the parameters of the MDP. Finally, we report numerical simulations supporting our theoretical findings and showing how TUCRL overcomes the limitations of the state-of-the-art.
The study of cross-domain mapping without supervision has recently attracted much attention. Much of the recent progress was enabled by the use of adversarial training as well as cycle constraints. The practical difficulty of adversarial training motivates research into non-adversarial methods. In a recent paper, it was shown that cross-domain mapping is possible without the use of cycles or GANs. Although promising, this approach suffers from several drawbacks including costly inference and an optimization variable for every training example preventing the method from using large training sets. We present an alternative approach which is able to achieve non-adversarial mapping using a novel form of Variational Auto-Encoder. Our method is much faster at inference time, is able to leverage large datasets and has a simple interpretation.
Given a single image x from domain A and a set of images from domain B, our task is to generate the analogous of x in B. We argue that this task could be a key AI capability that underlines the ability of cognitive agents to act in the world and present empirical evidence that the existing unsupervised domain translation methods fail on this task. Our method follows a two step process. First, a variational autoencoder for domain B is trained. Then, given the new sample x, we create a variational autoencoder for domain A by adapting the layers that are close to the image in order to directly fit x, and only indirectly adapt the other layers. Our experiments indicate that the new method does as well, when trained on one sample x, as the existing domain transfer methods, when these enjoy a multitude of training samples from domain A. Our code is made publicly available at https://github.com/sagiebenaim/OneShotTranslation.
Recent progress in deep learning for audio synthesis opens the way to models that directly produce the waveform, shifting away from the traditional paradigm of relying on vocoders or MIDI synthesizers for speech or music generation. Despite their successes, current state-of-the-art neural audio synthesizers such as WaveNet and SampleRNN [24, 17] suffer from prohibitive training and inference times because they are based on autoregressive models that generate audio samples one at a time at a rate of 16kHz. In this work, we study the more computationally efficient alternative of generating the waveform frame-by-frame with large strides. We present SING, a lightweight neural audio synthesizer for the original task of generating musical notes given desired instrument, pitch and velocity. Our model is trained end-to-end to generate notes from nearly 1000 instruments with a single decoder, thanks to a new loss function that minimizes the distances between the log spectrograms of the generated and target waveforms. On the generalization task of synthesizing notes for pairs of pitch and instrument not seen during training, SING produces audio with significantly improved perceptual quality compared to a state-of-the-art autoencoder based on WaveNet  as measured by a Mean Opinion Score (MOS), and is about 32 times faster for training and 2, 500 times faster for inference.
Pierre Thodoroff, Audrey Durand, Joelle Pineau and Doina Precup
Several applications of Reinforcement Learning suffer from instability due to high variance. This is especially prevalent in high dimensional domains. Regularization is a commonly used technique in machine learning to reduce variance, at the cost of introducing some bias. Most existing regularization techniques focus on spatial (perceptual) regularization. Yet in reinforcement learning, due to the nature of the Bellman equation, there is an opportunity to also exploit temporal regularization based on smoothness in value estimates over trajectories. This paper explores a class of methods for temporal regularization. We formally characterize the bias induced by this technique using Markov chain concepts. We illustrate the various characteristics of temporal regularization via a sequence of simple discrete and continuous MDPs, and show that the technique provides improvement even in high-dimensional Atari games.
Other activities at NeurIPS 2018
AI for Social Good Workshop
Announcing the Winners of the Conversational Intelligence Challenge 2 (ConvAI2)
Team: Mikhail Burtsev, Varvara Logacheva, Valentin Malykh, Ryan Lowe, Iulian Serban, Shrimai Prabhumoye, Emily Dinan, Douwe Kiela, Alexander Miller, Kurt Shuster, Arthur Szlam, Jack Urbanek and Jason Weston
Advisory board: Yoshua Bengio, Alan W. Black, Joelle Pineau, Alexander Rudnicky and Jason Williams
Bayesian Deep Learning Workshop
Paper: Bayesian Neural Networks using HackPPL with Application to User Location State Prediction
Beliz Gokkaya, Jessica Ai, Michael Tingley, Yonglong Zhang, Ning Dong, Thomas Jiang, Anitha Kubendran and Arun Kumar
Bayesian Nonparametrics Workshop
Ben Letham, invited speaker
Black in AI Workshop
Yann Dauphin, speaker
Causal Learning Workshop
Paper: Causality in Physics and Effective Theories of Agency
Dan Roberts and Max Kleiman-Weiner
Continual Learning Workshop
Marc’Aurelio Ranzato, speaker
Conversational AI Workshop
Paper: Cross-lingual contextual word representations for multilingual slot filling
Sonal Gupta, Rushin Shah and Sebastian Shuster
Paper: Improving Semantic Parsing for Task Oriented Dialog
Critiquing and Correcting Trends in Machine Learning Workshop
Kim Hazelwood, invited speaker
Deep Reinforcement Learning Workshop
Paper: Deep Counterfactual Regret Minimization
Noam Brown, Adam Lerer, Sam Gross, Tuomas Sandholm
Ethical, Social and Governance Issues in AI Workshop
Emergent Communication Workshop
Douwe Kiela and Kyunghyun Cho, organizers
Paper: Learning to communicate at scale in multiagent cooperative and competitive tasks
Amanpreet Singh, Tushar Jain and Sainbayar Sukhbaatar
Integration of Deep Learning Theories Workshop
Paper: SGD Implicitly Regularizes Generalization Error
Learning by Instruction Workshop
Jason Weston, speaker
Machine Learning for Creativity and Design Workshop
Yaniv Taigman, speaker
Machine Learning for Molecules and Materials Workshop
Kyunghyun Cho, speaker
Reinforcement Learning under Partial Observability Workshop
Joelle Pineau, speaker
Paper: High-Level Strategy Selection under Partial Observability in StarCraft: Brood War
Jonas Gehring, Da Ju, Vegard Mella, Daniel Gant, Nicolas Usunier and Gabriel Synnaeve
Paper: Learning Minimal Sufficient Representations of Partially Observable Decision Processes
Tommaso Furlanello, Amy Zhang, Kamyar Azizzadenesheli, Anima Anandkumar, Zachary C. Lipton, Laurent Itti and Joelle Pineau
Relational Representation Learning Workshop
Maximilian Nickel, speaker
Paper: Compositional Language Understanding with Text-based Relational Reasoning
Koustuv Sinha, Shagun Sodhani, William L. Hamilton and Joelle Pineau
Reproducible, Reusable, and Robust Reinforcement Learning (invited talk)
Abstract: We have seen significant achievements with deep reinforcement learning in recent years. Yet reproducing results for state-of-the-art deep RL methods is seldom straightforward. High variance of some methods can make learning particularly difficult when environments or rewards are strongly stochastic. Furthermore, results can be brittle to even minor perturbations in the domain or experimental procedure. In this talk, I will review challenges that arise in experimental techniques and reporting procedures in deep RL. I will also describe several recent results and guidelines designed to make future results more reproducible, reusable and robust.
Research to Production in NLP Applications at Facebook (Expo Workshop)
Second Conversational AI Workshop
Alborz Geramifard, chair
Smooth Games Optimization and Machine Learning Workshop
Paper: A Variational Inequality Perspective on GANs
Gauthier Gidel, Hugo Berard, Gaëtan Vignoud, Pascal Vincent and Simon Lacoste-Julien
Systems for Machine Learning Workshop
Paper: Explore-Exploit: A Framework for Interactive and Online Learning
Honglei Liu, Anuj Kumar, Wenhai Yang and Benoit Dumoulin
Paper: Stochastic Gradient Push for Distributed Deep Learning
Mido Assran, Nicolas Loizou, Nicolas Ballas and Mike Rabbat
Unsupervised Deep Learning Workshop
Marc’Aurelio Ranzato, co-presenter
Visually Grounded Interaction and Language (ViGIL) Workshop
Paper: Embodied Question Answering in Photorealistic Environments with Point Cloud Perception
Erik Wijmans, Samyak Datta, Oleksandr Maksymets, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Dhruv Batra and Devi Parikh
Wordplay: Reinforcement and Language Learning in Text-based Games Workshop
Jason Weston, speaker
Women in Machine Learning Workshop