Research Area
Year Published

285 Results

November 5, 2019

Memory Grounded Conversational Reasoning

Conference on Empirical Methods in Natural Language Processing (EMNLP)

We demonstrate a conversational system which engages the user through a multi-modal, multi-turn dialog over the user’s memories. The system can perform QA over memories by responding to user queries to recall specific attributes and associated media (e.g. photos) of past episodic memories. The system can also make proactive suggestions to surface related events or facts from past memories to make conversations more engaging and natural.

By: Shane Moon, Pararth Shah, Anuj Kumar, Rajen Subba

November 4, 2019

Quantifying the Semantic Core of Gender Systems

Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present the first large-scale investigation of the arbitrariness of noun–gender assignments. To that end, we use canonical correlation analysis to correlate the grammatical gender of inanimate nouns with an externally grounded definition of their lexical semantics. We find that 18 languages exhibit a significant correlation between grammatical gender and lexical semantics.

By: Adina Williams, Ryan Cotterell, Lawrence Wolf-Sonkin, Damián E. Blasi, Hanna Wallach

November 2, 2019

Mask-Predict: Parallel Decoding of Conditional Masked Language Models

Conference on Empirical Methods in Natural Language Processing (EMNLP)

Most machine translation systems generate text autoregressively from left to right. We, instead, use a masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a partially masked target translation.

By: Marjan Ghazvininejad, Omer Levy, Yinhan Liu, Luke Zettlemoyer

November 1, 2019

FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow

Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this paper, we propose a simple, efficient, and effective model for non-autoregressive sequence generation using latent variable models. Specifically, we turn to generative flow, an elegant technique to model complex distributions using neural networks, and design several layers of flow tailored for modeling the conditional density of sequential latent variables.

By: Xuezhe Ma, Chunting Zhou, Xian Li, Graham Neubig, Eduard Hovy

November 1, 2019

Learning to Speak and Act in a Fantasy Text Adventure Game

Conference on Empirical Methods in Natural Language Processing (EMNLP)

We introduce a large-scale crowdsourced text adventure game as a research platform for studying grounded dialogue. In it, agents can perceive, emote, and act whilst conducting dialogue with other agents. Models and humans can both act as characters within the game. We describe the results of training state-of-the-art generative and retrieval models in this setting.

By: Jack Urbanek, Angela Fan, Siddharth Karamcheti, Saachi Jain, Samuel Humeau, Emily Dinan, Tim Rocktäschel, Douwe Kiela, Arthur Szlam, Jason Weston

October 31, 2019

Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded

International Conference on Computer Vision (ICCV)

Many vision and language models suffer from poor visual grounding – often falling back on easy-to-learn language priors rather than basing their decisions on visual concepts in the image. In this work, we propose a generic approach called Human Importance-aware Network Tuning (HINT) that effectively leverages human demonstrations to improve visual grounding.

By: Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Larry Heck, Dhruv Batra, Devi Parikh

October 30, 2019

Data-efficient Co-Adaptation of Morphology and Behaviour with Deep Reinforcement Learning

Conference on Robot Learning (CoRL)

Humans and animals are capable of quickly learning new behaviours to solve new tasks. Yet, we often forget that they also rely on a highly specialized morphology that co-adapted with motor control throughout thousands of years. Although compelling, the idea of co-adapting morphology and behaviours in robots is often unfeasible because of the long manufacturing times, and the need to redesign an appropriate controller for each morphology. In this paper, we propose a novel approach to automatically and efficiently co-adapt a robot morphology and its controller.

By: Kevin Sebastian Luck, Heni Ben Amor, Roberto Calandra

October 29, 2019

Talking With Hands 16.2M: A Large-Scale Dataset of Synchronized Body-Finger Motion and Audio for Conversational Motion Analysis and Synthesis

International Conference on Computer Vision (ICCV)

We present a 16.2 million frame (50 hour) multimodal dataset of two-person face-to-face spontaneous conversations. Our dataset features synchronized body and finger motion as well as audio data. To the best of our knowledge, it represents the largest motion capture and audio dataset of natural conversations to date.

By: Gilwoo Lee, Zhiwei Deng, Shugao Ma, Takaaki Shiratori, Siddhartha S. Srinivasa, Yaser Sheikh

October 28, 2019

DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare

International Conference on Computer Vision (ICCV)

We present DenseRaC, a novel end-to-end framework for jointly estimating 3D human pose and body shape from a monocular RGB image. Our two-step framework takes the body pixel-to-surface correspondence map (i.e., IUV map) as proxy representation and then performs estimation of parameterized human pose and shape.

By: Yuanlu Xu, Song-Chun Zhu, Tony Tung

October 28, 2019

SlowFast Networks for Video Recognition

International Conference on Computer Vision (ICCV)

We present SlowFast networks for video recognition. Our model involves (i) a Slow pathway, operating at low frame rate, to capture spatial semantics, and (ii) a Fast pathway, operating at high frame rate, to capture motion at fine temporal resolution.

By: Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He