Research Area
Year Published

653 Results

December 3, 2018

Explore-Exploit: A Framework for Interactive and Online Learning

Systems for Machine Learning Workshop at NeurIPS 2018

We present Explore-Exploit: a framework designed to collect and utilize user feedback in an interactive and online setting that minimizes regressions in end-user experience. This framework provides a suite of online learning operators for various tasks such as personalization ranking, candidate selection and active learning.

By: Honglei Liu, Anuj Kumar, Wenhai Yang, Benoit Dumoulin

December 3, 2018

High-Level Strategy Selection under Partial Observability in StarCraft: Brood War

Reinforcement Learning under Partial Observability Workshop at NeurIPS 2018

We consider the problem of high-level strategy selection in the adversarial setting of real-time strategy games from a reinforcement learning perspective, where taking an action corresponds to switching to the respective strategy.

By: Jonas Gehring, Da Ju, Vegard Mella, Daniel Gant, Nicolas Usunier, Gabriel Synnaeve

December 3, 2018

Fighting Boredom in Recommender Systems with Linear Reinforcement Learning

Neural Information Processing Systems (NeurIPS)

A common assumption in recommender systems (RS) is the existence of a best fixed recommendation strategy. Such strategy may be simple and work at the item level (e.g., in multi-armed bandit it is assumed one best fixed arm/item exists) or implement more sophisticated RS (e.g., the objective of A/B testing is to find the best fixed RS and execute it thereafter). We argue that this assumption is rarely verified in practice, as the recommendation process itself may impact the user’s preferences.

By: Romain Warlop, Alessandro Lazaric, Jérémie Mary

December 3, 2018

Forward Modeling for Partial Observation Strategy Games – A StarCraft Defogger

Neural Information Processing Systems (NeurIPS)

We formulate the problem of defogging as state estimation and future state prediction from previous, partial observations in the context of real-time strategy games. We propose to employ encoder-decoder neural networks for this task, and introduce proxy tasks and baselines for evaluation to assess their ability of capturing basic game rules and high-level dynamics.

By: Gabriel Synnaeve, Zeming Lin, Jonas Gehring, Dan Gant, Vegard Mella, Vasil Khalidov, Nicolas Carion, Nicolas Usunier

December 3, 2018

Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes

Neural Information Processing Systems (NeurIPS)

While designing the state space of an MDP, it is common to include states that are transient or not reachable by any policy (e.g., in mountain car, the product space of speed and position contains configurations that are not physically reachable). This results in weakly-communicating or multi-chain MDPs. In this paper, we introduce TUCRL, the first algorithm able to perform efficient exploration-exploitation in any finite Markov Decision Process (MDP) without requiring any form of prior knowledge.

By: Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

December 3, 2018

Temporal Regularization in Markov Decision Process

Neural Information Processing Systems (NeurIPS)

This paper explores a class of methods for temporal regularization. We formally characterize the bias induced by this technique using Markov chain concepts. We illustrate the various characteristics of temporal regularization via a sequence of simple discrete and continuous MDPs, and show that the technique provides improvement even in high-dimensional Atari games.

By: Pierre Thodoroff, Audrey Durand, Joelle Pineau, Doina Precup

December 3, 2018

Training with Low-precision Embedding Tables

Systems for Machine Learning Workshop at NeurIPS 2018

In this work, we focus on building a system to train continuous embeddings in low precision floating point representation. Specifically, our system performs SGD-style model updates in single precision arithmetics, casts the updated parameters using stochastic rounding and stores the parameters in half-precision floating point.

By: Jian Zhang, Jiyan Yang, Hector Yuen

December 2, 2018

One-Shot Unsupervised Cross Domain Translation

Neural Information Processing Systems (NeurIPS)

Given a single image x from domain A and a set of images from domain B, our task is to generate the analogous of x in B. We argue that this task could be a key AI capability that underlines the ability of cognitive agents to act in the world and present empirical evidence that the existing unsupervised domain translation methods fail on this task.

By: Sagie Benaim, Lior Wolf

December 2, 2018

The Description Length of Deep Learning Models

Neural Information Processing Systems (NeurIPS)

We demonstrate experimentally the ability of deep neural networks to compress the training data even when accounting for parameter encoding. The compression viewpoint originally motivated the use of variational methods in neural networks (Hinton and Van Camp, 1993; Schmidhuber, 1997).

By: Léonard Blier, Yann Ollivier

November 30, 2018

A Block Coordinate Ascent Algorithm for Mean-Variance Optimization

Neural Information Processing Systems (NeurIPS)

Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare. The mean-variance function is one of the most widely used objective functions in risk management due to its simplicity and interpretability. Existing algorithms for mean-variance optimization are based on multi-time-scale stochastic approximation, whose learning rate schedules are often hard to tune, and have only asymptotic convergence proof. In this paper, we develop a model-free policy search framework for mean-variance optimization with finite-sample error bound analysis (to local optima).

By: Tengyang Xie, Bo Liu, Yangyang Xu, Mohammad Ghavamzadeh, Yinlam Chow, Daoming Lyu, Daesub Yoon