Research Area
Year Published

514 Results

November 25, 2019

Findings of the First Shared Task on Machine Translation Robustness

Conference on Machine Learning (WMT)

We share the findings of the first shared task on improving robustness of Machine Translation (MT). The task provides a testbed representing challenges facing MT models deployed in the real world, and facilitates new approaches to improve models’ robustness to noisy input and domain mismatch. We focus on two language pairs (English-French and English-Japanese), and the submitted systems are evaluated on a blind test set consisting of noisy comments on Reddit and professionally sourced translations.

By: Xian Li, Paul Michel, Antonios Anastasopoulos, Yonatan Belinkov, Nadir Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan Pino, Hassan Sajjad

November 17, 2019

Correlated Uncertainty for Learning Dense Correspondences from Noisy Labels

Neural Information Processing Systems (NeurIPS)

Many machine learning methods depend on human supervision to achieve optimal performance. However, in tasks such as DensePose, where the goal is to establish dense visual correspondences between images, the quality of manual annotations is intrinsically limited. We address this issue by augmenting neural network predictors with the ability to output a distribution over labels, thus explicitly and introspectively capturing the aleatoric uncertainty in the annotations.

By: Natalia Neverova, David Novotny, Andrea Vedaldi

November 10, 2019

Adversarial Bandits with Knapsacks

Symposium on Foundations of Computer Science (FOCS)

We consider Bandits with Knapsacks (henceforth, BwK), a general model for multi-armed bandits under supply/budget constraints. In particular, a bandit algorithm needs to solve a well-known knapsack problem: find an optimal packing of items into a limited-size knapsack. The BwK problem is a common generalization of numerous motivating examples, which range from dynamic pricing to repeated auctions to dynamic ad allocation to network routing and scheduling.

By: Nicole Immorlica, Karthik Abinav Sankararaman, Robert Schapire, Aleksandrs Slivkins

November 5, 2019

Memory Grounded Conversational Reasoning

Conference on Empirical Methods in Natural Language Processing (EMNLP)

We demonstrate a conversational system which engages the user through a multi-modal, multi-turn dialog over the user’s memories. The system can perform QA over memories by responding to user queries to recall specific attributes and associated media (e.g. photos) of past episodic memories. The system can also make proactive suggestions to surface related events or facts from past memories to make conversations more engaging and natural.

By: Shane Moon, Pararth Shah, Anuj Kumar, Rajen Subba

November 5, 2019

Revisiting the Evaluation of Theory of Mind through Question Answering

Conference on Empirical Methods in Natural Language Processing (EMNLP)

Theory of mind, i.e., the ability to reason about intents and beliefs of agents is an important task in artificial intelligence and central to resolving ambiguous references in natural language dialogue. In this work, we revisit the evaluation of theory of mind through question answering.

By: Matthew Le, Y-Lan Boureau, Maximilian Nickel

November 5, 2019

Recommendation as a Communication Game: Self-Supervised Bot-Play for Goal-oriented Dialogue

Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this work, we collect a goal-driven recommendation dialogue dataset (GoRecDial), which consists of 9,125 dialogue games and 81,260 conversation turns between pairs of human workers recommending movies to each other. The task is specifically designed as a cooperative game between two players working towards a quantifiable common goal.

By: Dongyeop Kang, Anusha Balakrishnan, Pararth Shah, Paul A. Crook, Y-Lan Boureau, Jason Weston

November 5, 2019

CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text

Conference on Empirical Methods in Natural Language Processing (EMNLP)

The recent success of natural language understanding (NLU) systems has been troubled by results highlighting the failure of these models to generalize in a systematic and robust way. In this work, we introduce a diagnostic benchmark suite, named CLUTRR, to clarify some key issues related to the robustness and systematicity of NLU systems.

By: Koustuv Sinha, Shagun Sodhani, Jin Dong, Joelle Pineau, William L. Hamilton

November 4, 2019

Quantifying the Semantic Core of Gender Systems

Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present the first large-scale investigation of the arbitrariness of noun–gender assignments. To that end, we use canonical correlation analysis to correlate the grammatical gender of inanimate nouns with an externally grounded definition of their lexical semantics. We find that 18 languages exhibit a significant correlation between grammatical gender and lexical semantics.

By: Adina Williams, Ryan Cotterell, Lawrence Wolf-Sonkin, Damián E. Blasi, Hanna Wallach

November 4, 2019

Finding Generalizable Evidence by Learning to Convince Q&A Models

Conference on Empirical Methods in Natural Language Processing (EMNLP)

We propose a system that finds the strongest supporting evidence for a given answer to a question, using passage-based question-answering (QA) as a testbed. We train evidence agents to select the passage sentences that most convince a pretrained QA model of a given answer, if the QA model received those sentences instead of the full passage.

By: Ethan Perez, Siddharth Karamcheti, Rob Fergus, Jason Weston, Douwe Kiela, Kyunghyun Cho

November 4, 2019

Countering Language Drift via Visual Grounding

Conference on Empirical Methods in Natural Language Processing (EMNLP)

Emergent multi-agent communication protocols are very different from natural language and not easily interpretable by humans. We find that agents that were initially pretrained to produce natural language can also experience detrimental language drift: when a nonlinguistic reward is used in a goal-based task, e.g. some scalar success metric, the communication protocol may easily and radically diverge from natural language.

By: Jason Lee, Kyunghyun Cho, Douwe Kiela