All Research Areas
Research Areas
Year Published

198 Results

October 22, 2017

Learning to Reason: End-to-End Module Networks for Visual Question Answering

International Conference on Computer Vision (ICCV)

In this paper, we propose End-to-End Module Networks (N2NMNs), which learn to reason by directly predicting instance-specific network layouts without the aid of a parser.

By: Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Kate Saenko
October 22, 2017

Inferring and Executing Programs for Visual Reasoning

International Conference on Computer Vision (ICCV)

Inspired by module networks, this paper proposes a model for visual reasoning that consists of a program generator that constructs an explicit representation of the reasoning process to be performed, and an execution engine that executes the resulting program to produce an answer.

By: Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Li Fei-Fei, Larry Zitnick, Ross Girshick
October 22, 2017

Dense and Low-Rank Gaussian CRFs using Deep Embeddings

International Conference on Computer Vision (ICCV)

In this work we introduce a structured prediction model that endows the Deep Gaussian Conditional Random Field (G-CRF) with a densely connected graph structure.

By: Siddhartha Chandra, Nicolas Usunier, Iasonas Kokkinos
October 22, 2017

Low-shot Visual Recognition by Shrinking and Hallucinating Features

International Conference on Computer Vision (ICCV)

Low-shot visual learning—the ability to recognize novel object categories from very few examples—is a hallmark of human visual intelligence. Existing machine learning approaches fail to generalize in the same way. We present a lowshot learning benchmark on complex images that mimics challenges faced by recognition systems in the wild.

By: Bharath Hariharan, Ross Girshick
October 22, 2017

Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

International Conference on Computer Vision (ICCV)

We introduce the first goal-driven training for visual question answering and dialog agents. Specifically, we pose a cooperative ‘image guessing’ game between two agents – Q-BOT and A-BOT– who communicate in natural language dialog so that Q-BOT can select an unseen image from a lineup of images.

By: Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee, Dhruv Batra
October 8, 2017

Living a Discrete Life in a Continuous World: Reference in Cross-Modal Entity Tracking

Proceedings of IWCS (12th International Conference on Computational Semantics)

This paper (a) introduces a concrete referential task to test both aspects, called cross-modal entity tracking; (b) proposes a neural network architecture that uses external memory to build an entity library inspired in the DRSs of DRT, with a mechanism to dynamically introduce new referents or add information to referents that are already in the library.

By: Gemma Boleda, Sebastian Pado', Nghia The Pham, Marco Baroni
October 5, 2017

STARDATA: a StarCraft AI Research Dataset

Association for the Advancement of Artificial Intelligence Digital Entertainment Conference

We release a dataset of 65646 StarCraft replays that contains 1535 million frames and 496 million player actions. We provide full game state data along with the original replays that can be viewed in StarCraft.

By: Zeming Lin, Jonas Gehring, Vasil Khalidov, Gabriel Synnaeve
September 7, 2017

Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog

Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this paper, using a Task & Talk reference game between two agents as a testbed, we present a sequence of ‘negative’ results culminating in a ‘positive’ one – showing that while most agent-invented languages are effective (i.e. achieve near-perfect task rewards), they are decidedly not interpretable or compositional.

By: Satwik Kottur, José M.F. Moura, Stefan Lee, Dhruv Batra
September 7, 2017

Grasping the Finer Point: A Supervised Similarity Network for Metaphor Detection

The Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this paper, we present the first deep learning architecture designed to capture metaphorical composition. Our results demonstrate that it outperforms the existing approaches in the metaphor identification task.

By: Marek Rei, Luana Bulat, Douwe Kiela, Ekaterina Shutova
September 7, 2017

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

Conference on Empirical Methods on Natural Language Processing (EMNLP)

In this paper, we show how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors (Kiros et al., 2015) on a wide range of transfer tasks.

By: Alexis Conneau, Douwe Kiela, Holger Schwenk, LoÏc Barrault, Antoine Bordes