Research Area
Year Published

70 Results

June 2, 2019

The emergence of number and syntax units in LSTM language models

Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)

We present here a detailed study of the inner mechanics of number tracking in LSTMs at the single neuron level. We discover that long-distance number information is largely managed by two “number units”.

By: Yair Lakretz, Germán Kruszewski, Theo Desbordes, Dieuwke Hupkes, Stanislas Dehaene, Marco Baroni

June 1, 2019

Neural Models of Text Normalization for Speech Applications

Computational Linguistics

Machine learning, including neural network techniques, have been applied to virtually every domain in natural language processing. One problem that has been somewhat resistant to effective machine learning solutions is text normalization for speech applications such as text-to-speech synthesis (TTS).

By: Hao Zhang, Richard Sproat, Axel H. Ng, Felix Stahlberg, Xiaochang Peng, Kyle Gorman, Brian Roark

May 6, 2019

No Training Required: Exploring Random Encoders for Sentence Classification

International Conference on Learning Representations (ICLR)

We explore various methods for computing sentence representations from pre-trained word embeddings without any training, i.e., using nothing but random parameterizations.

By: John Wieting, Douwe Kiela

May 4, 2019

Quasi-Hyperbolic Momentum and Adam for Deep Learning

International Conference on Learning Representations (ICLR)

Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning. We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step. We describe numerous connections to and identities with other algorithms, and we characterize the set of two-state optimization algorithms that QHM can recover.

By: Jerry Ma, Denis Yarats

December 14, 2018

PyText: A seamless path from NLP research to production

We introduce PyText – a deep learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapid experimentation and of serving models at scale.

By: Ahmed Aly, Kushal Lakhotia, Shicong Zhao, Mrinal Mohit, Barlas Oğuz, Abhinav Arora, Sonal Gupta, Christopher Dewan, Stef Nelson-Lindall, Rushin Shah

December 4, 2018

Improving Semantic Parsing for Task Oriented Dialog

Conversational AI Workshop at NeurIPS 2018

Semantic parsing using hierarchical representations has recently been proposed for task oriented dialog with promising results. In this paper, we present three different improvements to the model: contextualized embeddings, ensembling, and pairwise re-ranking based on a language model.

By: Arash Einolghozati, Panupong Pasupat, Sonal Gupta, Rushin Shah, Mrinal Mohit, Mike Lewis, Luke Zettlemoyer

December 3, 2018

Explore-Exploit: A Framework for Interactive and Online Learning

Systems for Machine Learning Workshop at NeurIPS 2018

We present Explore-Exploit: a framework designed to collect and utilize user feedback in an interactive and online setting that minimizes regressions in end-user experience. This framework provides a suite of online learning operators for various tasks such as personalization ranking, candidate selection and active learning.

By: Honglei Liu, Anuj Kumar, Wenhai Yang, Benoit Dumoulin

November 2, 2018

Do explanations make VQA models more predictable to a human?

Empirical Methods in Natural Language Processing (EMNLP)

A rich line of research attempts to make deep neural networks more transparent by generating human-interpretable ‘explanations’ of their decision process, especially for interactive tasks like Visual Question Answering (VQA). In this work, we analyze if existing explanations indeed make a VQA model – its responses as well as failures – more predictable to a human.

By: Arjun Chandrasekaran, Viraj Prabhu, Deshraj Yadav, Prithvijit Chattopadhyay, Devi Parikh

November 2, 2018

Jump to better conclusions: SCAN both left and right

Empirical Methods in Natural Language Processing (EMNLP)

Lake and Baroni (2018) recently introduced the SCAN data set, which consists of simple commands paired with action sequences and is intended to test the strong generalization abilities of recurrent sequence-to-sequence models. Their initial experiments suggested that such models may fail because they lack the ability to extract systematic rules. Here, we take a closer look at SCAN and show that it does not always capture the kind of generalization that it was designed for.

By: Joost Bastings, Marco Baroni, Jason Weston, Kyunghyun Cho, Douwe Kiela

November 2, 2018

Dynamic Meta-Embeddings for Improved Sentence Representations

Empirical Methods in Natural Language Processing (EMNLP)

While one of the first steps in many NLP systems is selecting what pre-trained word embeddings to use, we argue that such a step is better left for neural networks to figure out by themselves.

By: Douwe Kiela, Changhan Wang, Kyunghyun Cho