Research Area
Year Published

981 Results

December 9, 2019

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

Neural Information Processing Systems (NeurIPS)

We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language. We extend the popular BERT architecture to a multi-modal two-stream model, processing both visual and textual inputs in separate streams that interact through co-attentional transformer layers.

By: Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee

December 9, 2019

Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning

Neural Information Processing Systems (NeurIPS)

Multi-simulator training has contributed to the recent success of Deep Reinforcement Learning by stabilizing learning and allowing for higher training throughputs. We propose Gossip-based Actor-Learner Architectures (GALA) where several actor-learners (such as A2C agents) are organized in a peer-to-peer communication topology, and exchange information through asynchronous gossip in order to take advantage of a large number of distributed simulators.

By: Mahmoud Assran, Joshua Romoff, Nicolas Ballas, Joelle Pineau, Michael Rabbat

December 8, 2019

Anti-efficient encoding in emergent communication

Neural Information Processing Systems (NeurIPS)

Despite renewed interest in emergent language simulations with neural networks, little is known about the basic properties of the induced code, and how they compare to human language. One fundamental characteristic of the latter, known as Zipf’s Law of Abbreviation (ZLA), is that more frequent words are efficiently associated to shorter strings. We study whether the same pattern emerges when two neural networks, a “speaker” and a “listener”, are trained to play a signaling game.

By: Rahma Chaabouni, Eugene Kharitonov, Emmanuel Dupoux, Marco Baroni

December 8, 2019

Learning to Perform Local Rewriting for Combinatorial Optimization

Neural Information Processing Systems (NeurIPS)

Search-based methods for hard combinatorial optimization are often guided by heuristics. Tuning heuristics in various conditions and situations is often time-consuming. In this paper, we propose NeuRewriter that learns a policy to pick heuristics and rewrite the local components of the current solution to iteratively improve it until convergence.

By: Xinyun Chen, Yuandong Tian

December 8, 2019

Differentiable Convex Optimization Layers

Neural Information Processing Systems (NeurIPS)

Recent work has shown how to embed differentiable optimization problems (that is, problems whose solutions can be backpropagated through) as layers within deep learning architectures. This method provides a useful inductive bias for certain problems, but existing software for differentiable optimization layers is rigid and difficult to apply to new settings. In this paper, we propose an approach to differentiating through disciplined convex programs, a subclass of convex optimization problems used by domain-specific languages (DSLs) for convex optimization.

By: Akshay Agrawal, Brandon Amos, Shane Barratt, Stephen Boyd, Steven Diamond, J. Zico Kolter

December 8, 2019

On the Curved Geometry of Accelerated Optimization

Neural Information Processing Systems (NeurIPS)

In this work we propose a differential geometric motivation for Nesterov’s accelerated gradient method (AGM) for strongly-convex problems. By considering the optimization procedure as occurring on a Riemannian manifold with a natural structure, The AGM method can be seen as the proximal point method applied in this curved space.

By: Aaron Defazio

December 8, 2019

Levenshtein Transformer

Neural Information Processing Systems (NeurIPS)

Modern neural sequence generation models are built to either generate tokens step-by-step from scratch or (iteratively) modify a sequence of tokens bounded by a fixed length. In this work, we develop Levenshtein Transformer, a new partially autoregressive model devised for more flexible and amenable sequence generation. Unlike previous approaches, the basic operations of our model are insertion and deletion. The combination of them facilitates not only generation but also sequence refinement allowing dynamic length changes.

By: Jiatao Gu, Changhan Wang, Jake (Junbo) Zhao

December 8, 2019

RUBi: Reducing Unimodal Biases for Visual Question Answering

Neural Information Processing Systems (NeurIPS)

Visual Question Answering (VQA) is the task of answering questions about an image. Some VQA models often exploit unimodal biases to provide the correct answer without using the image information. As a result, they suffer from a huge drop in performance when evaluated on data outside their training set distribution. This critical issue makes them unsuitable for real-world settings. We propose RUBi, a new learning strategy to reduce biases in any VQA model. It reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image.

By: Remi Cadene, Corentin Dancette, Hedi Ben-younes, Matthieu Cord, Devi Parikh

December 8, 2019

On the Ineffectiveness of Variance Reduced Optimization for Deep Learning

Neural Information Processing Systems (NeurIPS)

The application of stochastic variance reduction to optimization has shown remarkable recent theoretical and practical success. The applicability of these techniques to the hard non-convex optimization problems encountered during training of modern deep neural networks is an open problem. We show that naive application of the SVRG technique and related approaches fail, and explore why.

By: Aaron Defazio, Leon Bottou

December 8, 2019

Cross-channel Communication Networks

Neural Information Processing Systems (NeurIPS)

Convolutional neural networks process input data by sending channel-wise feature response maps to subsequent layers. While a lot of progress has been made by making networks deeper, information from each channel can only be propagated from lower levels to higher levels in a hierarchical feed-forward manner. When viewing each filter in the convolutional layer as a neuron, those neurons are not communicating explicitly within each layer in CNNs. We introduce a novel network unit called Cross-channel Communication (C3) block, a simple yet effective module to encourage the neuron communication within the same layer.

By: Jianwei Yang, Zhile Ren, Hongyuan Zhu, Ji Lin, Chuang Gan, Devi Parikh