Research Area
Year Published

285 Results

December 1, 2019

Regret Bounds for Learning State Representations in Reinforcement Learning

Neural Information Processing Systems (NeurIPS)

We consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent. At least one of these representations is assumed to induce a Markov decision process (MDP), and the performance of the agent is measured in terms of cumulative regret against the optimal policy giving the highest average reward in this MDP representation.

By: Ronald Ortner, Matteo Pirotta, Ronan Fruit, Alessandro Lazaric, Odalric-Ambrym Maillard

November 30, 2019

MVFST-RL: An Asynchronous RL Framework for Congestion Control with Delayed Actions

Workshop on ML for Systems at NeurIPS

We present MVFST-RL, a scalable framework for congestion control in the QUIC transport protocol that leverages state-of-the-art in asynchronous RL training with off-policy correction. We analyze modeling improvements to mitigate the deviation from Markovian dynamics, and evaluate our method on emulated networks from the Pantheon benchmark platform.

By: Viswanath Sivakumar, Tim Rocktäschel, Alexander H. Miller, Heinrich Küttler, Nantas Nardelli, Mike Rabbat, Joelle Pineau, Sebastian Riedel

November 30, 2019

Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs

Neural Information Processing Systems (NeurIPS)

The exploration bonus is an effective approach to manage the exploration-exploitation trade-off in Markov Decision Processes (MDPs). While it has been analyzed in infinite-horizon discounted and finite-horizon problems, we focus on designing and analysing the exploration bonus in the more challenging infinite-horizon undiscounted setting.

By: Jian Qian, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

November 27, 2019

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

Neural Information Processing Systems (NeurIPS)

State-of-the-art efficient model-based Reinforcement Learning (RL) algorithms typically act by iteratively solving empirical models, i.e., by performing full-planning on Markov Decision Processes (MDPs) built by the gathered experience. In this paper, we focus on model-based RL in the finite-state finite-horizon undiscounted MDP setting and establish that exploring with greedy policies – act by 1-step planning – can achieve tight minimax performance in terms of regret, Õ(√HSAT).

By: Yonathan Efroni, Nadav Merlis, Mohammad Ghavamzadeh, Shie Mannor

November 25, 2019

Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller

Neural Information Processing Systems (NeurIPS)

We study a generalized setup for learning from demonstration to build an agent that can manipulate novel objects in unseen scenarios by looking at only a single video of human demonstration from a third-person perspective. To accomplish this goal, our agent should not only learn to understand the intent of the demonstrated third-person video in its context but also perform the intended task in its environment configuration.

By: Pratyusha Sharma, Deepak Pathak, Abhinav Gupta

November 18, 2019

DeepFovea: Neural Reconstruction for Foveated Rendering and Video Compression using Learned Statistics of Natural Videos

ACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia

Foveated rendering and compression can save computations by reducing the image quality in the peripheral vision. However, this can cause noticeable artifacts in the periphery, or, if done conservatively, would provide only modest savings. In this work, we explore a novel foveated reconstruction method that employs the recent advances in generative adversarial neural networks.

By: Anton S. Kaplanyan, Anton Sochenov, Thomas Leimkühler, Mikhail Okunev, Todd Goodall, Gizem Rufo

November 10, 2019

Adversarial Bandits with Knapsacks

Symposium on Foundations of Computer Science (FOCS)

We consider Bandits with Knapsacks (henceforth, BwK), a general model for multi-armed bandits under supply/budget constraints. In particular, a bandit algorithm needs to solve a well-known knapsack problem: find an optimal packing of items into a limited-size knapsack. The BwK problem is a common generalization of numerous motivating examples, which range from dynamic pricing to repeated auctions to dynamic ad allocation to network routing and scheduling.

By: Nicole Immorlica, Karthik Abinav Sankararaman, Robert Schapire, Aleksandrs Slivkins

November 7, 2019

Feature Selection for Facebook Feed Ranking System via a Group-Sparsity-Regularized Training Algorithm

Conference on Information and Knowledge Management (CIKM)

In modern production platforms, large scale online learning models are applied to data of very high dimension. To save computational resource, it is important to have an efficient algorithm to select the most significant features from an enormous feature pool. In this paper, we propose a novel neural-network-suitable feature selection algorithm, which selects important features from the input layer during training.

By: Xiuyan Ni, Yang Yu, Peng Wu, Youlin Li, Shaoliang Nie, Qichao Que, Chao Chen

November 5, 2019

Recommendation as a Communication Game: Self-Supervised Bot-Play for Goal-oriented Dialogue

Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this work, we collect a goal-driven recommendation dialogue dataset (GoRecDial), which consists of 9,125 dialogue games and 81,260 conversation turns between pairs of human workers recommending movies to each other. The task is specifically designed as a cooperative game between two players working towards a quantifiable common goal.

By: Dongyeop Kang, Anusha Balakrishnan, Pararth Shah, Paul A. Crook, Y-Lan Boureau, Jason Weston

November 5, 2019

CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text

Conference on Empirical Methods in Natural Language Processing (EMNLP)

The recent success of natural language understanding (NLU) systems has been troubled by results highlighting the failure of these models to generalize in a systematic and robust way. In this work, we introduce a diagnostic benchmark suite, named CLUTRR, to clarify some key issues related to the robustness and systematicity of NLU systems.

By: Koustuv Sinha, Shagun Sodhani, Jin Dong, Joelle Pineau, William L. Hamilton