Research Area
Year Published

980 Results

December 8, 2019

Chasing Ghosts: Instruction Following as Bayesian State Tracking

Neural Information Processing Systems (NeurIPS)

A visually-grounded navigation instruction can be interpreted as a sequence of expected observations and actions an agent following the correct trajectory would encounter and perform. Based on this intuition, we formulate the problem of finding the goal location in Vision-and-Language Navigation (VLN) [1] within the framework of Bayesian state tracking – learning observation and motion models conditioned on these expectable events.

By: Peter Anderson, Ayush Shrivastava, Devi Parikh, Dhruv Batra, Stefan Lee

December 7, 2019

Unsupervised Object Segmentation by Redrawing

Neural Information Processing Systems (NeurIPS)

Object segmentation is a crucial problem that is usually solved by using supervised learning approaches over very large datasets composed of both images and corresponding object masks. Since the masks have to be provided at pixel level, building such a dataset for any new domain can be very time-consuming. We present ReDO, a new model able to extract objects from images without any annotation in an unsupervised way.

By: Mickaël Chen, Thierry Artières, Ludovic Denoyer

December 2, 2019

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Neural Information Processing Systems (NeurIPS)

In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.

By: Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, Soumith Chintala

December 2, 2019

Hierarchical Decision Making by Generating and Following Natural Language Instructions

Neural Information Processing Systems (NeurIPS)

We explore using latent natural language instructions as an expressive and compositional representation of complex actions for hierarchical decision making. Rather than directly selecting micro-actions, our agent first generates a latent plan in natural language, which is then executed by a separate model. We introduce a challenging real-time strategy game environment in which the actions of a large number of units must be coordinated across long time scales.

By: Hengyuan Hu, Denis Yarats, Qucheng Gong, Yuandong Tian, Mike Lewis

December 2, 2019

Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints

Workshop on Safety and Robustness in Decision Making at NeurIPS

In this paper, we propose a novel Thompson sampling algorithm for multi-outcome contextual bandit problems with auxiliary constraints. We empirically evaluate our algorithm on a synthetic problem. Lastly, we apply our method to a real world video transcoding problem and provide a practical way for navigating the trade-off between safety and performance using Bayesian optimization.

By: Sam Daulton, Shaun Singh, Vashist Avadhanula, Drew Dimmery, Eytan Bakshy

December 1, 2019

Regret Bounds for Learning State Representations in Reinforcement Learning

Neural Information Processing Systems (NeurIPS)

We consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent. At least one of these representations is assumed to induce a Markov decision process (MDP), and the performance of the agent is measured in terms of cumulative regret against the optimal policy giving the highest average reward in this MDP representation.

By: Ronald Ortner, Matteo Pirotta, Ronan Fruit, Alessandro Lazaric, Odalric-Ambrym Maillard

December 1, 2019

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

Neural Information Processing Systems (NeurIPS)

In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard.

By: Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman

December 1, 2019

Efficient Representation and Sparse Sampling of Head-Related Transfer Functions Using Phase-Correction Based on Ear Alignment

IEEE Transactions on Audio, Speech, and Language Processing (TASLP)

In this paper, a new method for pre-processing HRTFs in order to reduce their effective order is presented. The method uses phase-correction based on ear alignment, by exploiting the dual-centering nature of HRTF measurements. In contrast to time-alignment, the phase-correction is performed parametrically, making it more robust to measurement noise. The SH order reduction and ensuing interpolation errors due to sparse sampling were analyzed for these two methods.

By: Zamir Ben-Hur, David Lou Alon, Ravish Mehra, Boaz Rafaely
Areas: AR/VR

November 30, 2019

Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs

Neural Information Processing Systems (NeurIPS)

The exploration bonus is an effective approach to manage the exploration-exploitation trade-off in Markov Decision Processes (MDPs). While it has been analyzed in infinite-horizon discounted and finite-horizon problems, we focus on designing and analysing the exploration bonus in the more challenging infinite-horizon undiscounted setting.

By: Jian Qian, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

November 30, 2019

MVFST-RL: An Asynchronous RL Framework for Congestion Control with Delayed Actions

Workshop on ML for Systems at NeurIPS

We present MVFST-RL, a scalable framework for congestion control in the QUIC transport protocol that leverages state-of-the-art in asynchronous RL training with off-policy correction. We analyze modeling improvements to mitigate the deviation from Markovian dynamics, and evaluate our method on emulated networks from the Pantheon benchmark platform.

By: Viswanath Sivakumar, Tim Rocktäschel, Alexander H. Miller, Heinrich Küttler, Nantas Nardelli, Mike Rabbat, Joelle Pineau, Sebastian Riedel