Where Are You? Localization from Embodied Dialog

Conference on Empirical Methods in Natural Language Processing (EMNLP)


We present WHERE ARE YOU? (WAY), a dataset of ∼6k dialogs in which two humans – an Observer and a Locator – complete a cooperative localization task. The Observer is spawned at random in a 3D environment and can navigate from first-person views while answering questions from the Locator. The Locator must localize the Observer in a detailed top-down map by asking questions and giving instructions. Based on this dataset, we define three challenging tasks: Localization from Embodied Dialog or LED (localizing the Observer from dialog history), Embodied Visual Dialog (modeling the Observer), and Cooperative Localization (modeling both agents). In this paper, we focus on the LED task – providing a strong baseline model with detailed ablations characterizing both dataset biases and the importance of various modeling choices. Our best model achieves 32.7% success at identifying the Observer’s location within 3m in unseen buildings, vs. 70.4% for human Locators.

Related Publications

All Publications

Journal of Big Data - July 19, 2021

Cumulative deviation of a subpopulation from the full population

Mark Tygert

NeurIPS - July 16, 2021

Fast Matrix Square Roots with Applications to Gaussian Processes and Bayesian Optimization

Geoff Pleiss, Martin Jankowiak, David Eriksson, Anil Damle, Jacob R. Gardner

ICML - July 19, 2021

Making Paper Reviewing Robust to Bid Manipulation Attacks

Ruihan Wu, Chuan Guo, Felix Wu, Rahul Kidambi, Laurens van der Maaten, Kilian Q. Weinberger

AISTATS - August 31, 2021

Causal Autoregressive Flows

Ilyes Khemakhem, Ricardo P. Monti, Robert Leech, Aapo Hyvärinen

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy