Research Area
Year Published

170 Results

June 7, 2019

Cycle-Consistency for Robust Visual Question Answering

Computer Vision and Pattern Recognition (CVPR)

Despite significant progress in Visual Question Answering over the years, robustness of today’s VQA models leave much to be desired. We introduce a new evaluation protocol and associated dataset (VQA-Rephrasings) and show that state-of-the-art VQA models are notoriously brittle to linguistic variations in questions.

By: Meet Shah, Xinlei Chen, Marcus Rohrbach, Devi Parikh

June 3, 2019

Pay less attention with Lightweight and Dynamic Convolutions

International Conference on Learning Representations (ICLR)

Self-attention is a useful mechanism to build generative models for language and images. It determines the importance of context elements by comparing each element to the current time step. In this paper, we show that a very lightweight convolution can perform competitively to the best reported self-attention results.

By: Felix Wu, Angela Fan, Alexei Baevski, Yann Dauphin, Michael Auli

June 2, 2019

The emergence of number and syntax units in LSTM language models

Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)

We present here a detailed study of the inner mechanics of number tracking in LSTMs at the single neuron level. We discover that long-distance number information is largely managed by two “number units”.

By: Yair Lakretz, Germán Kruszewski, Theo Desbordes, Dieuwke Hupkes, Stanislas Dehaene, Marco Baroni

May 31, 2019

Abusive Language Detection with Graph Convolutional Networks

North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)

Abuse on the Internet represents a significant societal problem of our time. Previous research on automated abusive language detection in Twitter has shown that community-based profiling of users is a promising technique for this task. However, existing approaches only capture shallow properties of online communities by modeling follower–following relationships.

By: Pushkar Mishra, Marco Del Tredici, Helen Yannakoudakis, Ekaterina Shutova

May 6, 2019

Efficient Lifelong Learning with A-GEM

International Conference on Learning Representations (ICLR)

In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task. In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost.

By: Arslan Chaudhry, Marc'Aurelio Ranzato, Marcus Rohrbach, Mohamed Elhoseiny

May 6, 2019

Selfless Sequential Learning

International Conference on Learning Representations (ICLR)

Sequential learning, also called lifelong learning, studies the problem of learning tasks in a sequence with access restricted to only the data of the current task. In this paper we look at a scenario with fixed model capacity, and postulate that the learning process should not be selfish, i.e. it should account for future tasks to be added and thus leave enough capacity for them.

By: Rahaf Aljundi, Marcus Rohrbach, Tinne Tuytelaars

May 6, 2019

Adaptive Input Representations for Neural Language Modeling

International Conference on Learning Representations (ICLR)

We introduce adaptive input representations for neural language modeling which extend the adaptive softmax of Grave et al. (2017) to input representations of variable capacity.

By: Alexei Baevski, Michael Auli

May 4, 2019

Quasi-Hyperbolic Momentum and Adam for Deep Learning

International Conference on Learning Representations (ICLR)

Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning. We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step. We describe numerous connections to and identities with other algorithms, and we characterize the set of two-state optimization algorithms that QHM can recover.

By: Jerry Ma, Denis Yarats

April 15, 2019

Optimizing over a Restricted Policy Class in MDPs


We address the problem of finding an optimal policy in a Markov decision process under a restricted policy class defined by the convex hull of a set of base policies. This problem is of great interest in applications in which a number of reasonably good (or safe) policies are already known and we are interested in optimizing in their convex hull.

By: Ershad Banijamali, Yasin Abbasi-Yadkori, Mohammad Ghavamzadeh, Nikos Vlassis

April 2, 2019

Bandana: Using Non-Volatile Memory for Storing Deep Learning Models

Conference on Systems and Machine Learning (SysML)

Typical large-scale recommender systems use deep learning models that are stored on a large amount of DRAM. These models often rely on embeddings, which consume most of the required memory. We present Bandana, a storage system that reduces the DRAM footprint of embeddings, by using Non-volatile Memory (NVM) as the primary storage medium, with a small amount of DRAM as cache.

By: Assaf Eisenman, Maxim Naumov, Darryl Gardner, Misha Smelyanskiy, Sergey Pupyrev, Kim Hazelwood, Asaf Cidon, Sachin Katti