Research Area
Year Published

977 Results

December 7, 2019

Unsupervised Object Segmentation by Redrawing

Neural Information Processing Systems (NeurIPS)

Object segmentation is a crucial problem that is usually solved by using supervised learning approaches over very large datasets composed of both images and corresponding object masks. Since the masks have to be provided at pixel level, building such a dataset for any new domain can be very time-consuming. We present ReDO, a new model able to extract objects from images without any annotation in an unsupervised way.

By: Mickaël Chen, Thierry Artières, Ludovic Denoyer

December 2, 2019

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Neural Information Processing Systems (NeurIPS)

In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.

By: Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, Soumith Chintala

December 2, 2019

Hierarchical Decision Making by Generating and Following Natural Language Instructions

Neural Information Processing Systems (NeurIPS)

We explore using latent natural language instructions as an expressive and compositional representation of complex actions for hierarchical decision making. Rather than directly selecting micro-actions, our agent first generates a latent plan in natural language, which is then executed by a separate model. We introduce a challenging real-time strategy game environment in which the actions of a large number of units must be coordinated across long time scales.

By: Hengyuan Hu, Denis Yarats, Qucheng Gong, Yuandong Tian, Mike Lewis

December 1, 2019

Regret Bounds for Learning State Representations in Reinforcement Learning

Neural Information Processing Systems (NeurIPS)

We consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent. At least one of these representations is assumed to induce a Markov decision process (MDP), and the performance of the agent is measured in terms of cumulative regret against the optimal policy giving the highest average reward in this MDP representation.

By: Ronald Ortner, Matteo Pirotta, Ronan Fruit, Alessandro Lazaric, Odalric-Ambrym Maillard

December 1, 2019

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

Neural Information Processing Systems (NeurIPS)

In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard.

By: Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman

December 1, 2019

Efficient Representation and Sparse Sampling of Head-Related Transfer Functions Using Phase-Correction Based on Ear Alignment

IEEE Transactions on Audio, Speech, and Language Processing (TASLP)

In this paper, a new method for pre-processing HRTFs in order to reduce their effective order is presented. The method uses phase-correction based on ear alignment, by exploiting the dual-centering nature of HRTF measurements. In contrast to time-alignment, the phase-correction is performed parametrically, making it more robust to measurement noise. The SH order reduction and ensuing interpolation errors due to sparse sampling were analyzed for these two methods.

By: Zamir Ben-Hur, David Lou Alon, Ravish Mehra, Boaz Rafaely
Areas: AR/VR

November 30, 2019

Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs

Neural Information Processing Systems (NeurIPS)

The exploration bonus is an effective approach to manage the exploration-exploitation trade-off in Markov Decision Processes (MDPs). While it has been analyzed in infinite-horizon discounted and finite-horizon problems, we focus on designing and analysing the exploration bonus in the more challenging infinite-horizon undiscounted setting.

By: Jian Qian, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

November 30, 2019

MVFST-RL: An Asynchronous RL Framework for Congestion Control with Delayed Actions

Workshop on ML for Systems at NeurIPS

We present MVFST-RL, a scalable framework for congestion control in the QUIC transport protocol that leverages state-of-the-art in asynchronous RL training with off-policy correction. We analyze modeling improvements to mitigate the deviation from Markovian dynamics, and evaluate our method on emulated networks from the Pantheon benchmark platform.

By: Viswanath Sivakumar, Tim Rocktäschel, Alexander H. Miller, Heinrich Küttler, Nantas Nardelli, Mike Rabbat, Joelle Pineau, Sebastian Riedel

November 27, 2019

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

Neural Information Processing Systems (NeurIPS)

State-of-the-art efficient model-based Reinforcement Learning (RL) algorithms typically act by iteratively solving empirical models, i.e., by performing full-planning on Markov Decision Processes (MDPs) built by the gathered experience. In this paper, we focus on model-based RL in the finite-state finite-horizon undiscounted MDP setting and establish that exploring with greedy policies – act by 1-step planning – can achieve tight minimax performance in terms of regret, Õ(√HSAT).

By: Yonathan Efroni, Nadav Merlis, Mohammad Ghavamzadeh, Shie Mannor

November 25, 2019

Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller

Neural Information Processing Systems (NeurIPS)

We study a generalized setup for learning from demonstration to build an agent that can manipulate novel objects in unseen scenarios by looking at only a single video of human demonstration from a third-person perspective. To accomplish this goal, our agent should not only learn to understand the intent of the demonstrated third-person video in its context but also perform the intended task in its environment configuration.

By: Pratyusha Sharma, Deepak Pathak, Abhinav Gupta