Research Area
Year Published

125 Results

July 28, 2019

ELI5: Long Form Question Answering

Association for Computational Linguistics (ACL)

We introduce the first large-scale corpus for long-form question answering, a task requiring elaborate and in-depth answers to open-ended questions. The dataset comprises 270K threads from the Reddit forum “Explain Like I’m Five” (ELI5) where an online community provides answers to questions which are comprehensible by five year olds.

By: Angela Fan, Yacine Jernite, Ethan Perez, David Grangier, Jason Weston, Michael Auli

July 28, 2019

Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation

Association for Computational Linguistics (ACL)

Given a rough, word-by-word gloss of a source language sentence, target language natives can uncover the latent, fully-fluent rendering of the translation. In this work we explore this intuition by breaking translation into a two step process: generating a rough gloss by means of a dictionary and then ‘translating’ the resulting pseudo-translation, or ‘Translationese’ into a fully fluent translation.

By: Nima Pourdamghani, Nada Aldarrab, Marjan Ghazvininejad, Kevin Knight, Jonathan May

July 28, 2019

Inferring Concept Hierarchies from Text Corpora via Hyperbolic Embeddings

Association for Computational Linguistics (ACL)

We consider the task of inferring is-a relationships from large text corpora. For this purpose, we propose a new method combining hyperbolic embeddings and Hearst patterns. This approach allows us to set appropriate constraints for inferring concept hierarchies from distributional contexts while also being able to predict missing is-a-relationships and to correct wrong extractions.

By: Matt Le, Stephen Roller, Laetitia Papaxanthos, Douwe Kiela, Maximilian Nickel

July 28, 2019

What makes a good conversation? How controllable attributes affect human judgments

North American Chapter of the Association for Computational Linguistics (NAACL)

In this work, we examine two controllable neural text generation methods, conditional training and weighted decoding, in order to control four important attributes for chitchat dialogue: repetition, specificity, response-relatedness and question-asking.

By: Abigail See, Stephen Roller, Douwe Kiela, Jason Weston

July 28, 2019

Training Hybrid Language Models by Marginalizing over Segmentations

Association for Computational Linguistics (ACL)

In this paper, we study the problem of hybrid language modeling, that is using models which can predict both characters and larger units such as character ngrams or words. Using such models, multiple potential segmentations usually exist for a given string, for example one using words and one using characters only.

By: Edouard Grave, Sainbayar Sukhbaatar, Piotr Bojanowski, Armand Joulin

July 27, 2019

Unsupervised Question Answering by Cloze Translation

Association for Computational Linguistics (ACL)

Obtaining training data for Question Answering (QA) is time-consuming and resource-intensive, and existing QA datasets are only available for limited domains and languages. In this work, we explore to what extent high quality training data is actually required for Extractive QA, and investigate the possibility of unsupervised Extractive QA.

By: Patrick Lewis, Ludovic Denoyer, Sebastian Riedel

July 27, 2019

The Referential Reader: A Recurrent Entity Network for Anaphora Resolution

Association for Computational Linguistics (ACL)

We present a new architecture for storing and accessing entity mentions during online text processing. While reading the text, entity references are identified, and may be stored by either updating or overwriting a cell in a fixed-length memory.

By: Fei Liu, Luke Zettlemoyer, Jacob Eisenstein

July 27, 2019

Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings

Association for Computational Linguistics (ACL)

Machine translation is highly sensitive to the size and quality of the training data, which has led to an increasing interest in collecting and filtering large parallel corpora. In this paper, we propose a new method for this task based on multilingual sentence embeddings.

By: Mikel Artetxe, Holger Schwenk

July 26, 2019

Strategies for Structuring Story Generation

Association for Computational Linguistics (ACL)

Writers often rely on plans or sketches to write long stories, but most current language models generate word by word from left to right. We explore coarse-to-fine models for creating narrative texts of several hundred words, and introduce new models which decompose stories by abstracting over actions and entities.

By: Angela Fan, Mike Lewis, Yann Dauphin

June 18, 2019

Grounded Video Description

Conference Computer Vision and Pattern Recognition (CVPR)

Video description is one of the most challenging problems in vision and language understanding due to the large variability both on the video and language side. Models, hence, typically shortcut the difficulty in recognition and generate plausible sentences that are based on priors but are not necessarily grounded in the video. In this work, we explicitly link the sentence to the evidence in the video by annotating each noun phrase in a sentence with the corresponding bounding box in one of the frames of a video.

By: Luowei Zhou, Yannis Kalantidis, Xinlei Chen, Jason J. Corso, Marcus Rohrbach