Filter by Research Area
Filter by Research Area
Year Published

236 Results

June 18, 2018

Stacked Latent Attention for Multimodal Reasoning

Computer Vision and Pattern Recognition (CVPR)

Attention has shown to be a pivotal development in deep learning and has been used for a multitude of multimodal learning tasks such as visual question answering and image captioning. In this work, we pinpoint the potential limitations to the design of a traditional attention model.

By: Haoqi Fan, Jiatong Zhou
June 18, 2018

Embodied Question Answering

Computer Vision and Pattern Recognition (CVPR)

We present a new AI task – Embodied Question Answering (EmbodiedQA) – where an agent is spawned at a random location in a 3D environment and asked a question (‘What color is the car?’). In order to answer, the agent must first intelligently navigate to explore the environment, gather necessary visual information through first-person (egocentric) vision, and then answer the question (‘orange’). 

By: Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra
June 18, 2018

Eye In-Painting with Exemplar Generative Adversarial Networks

Computer Vision and Pattern Recognition (CVPR)

This paper introduces a novel approach to in-painting where the identity of the object to remove or change is preserved and accounted for at inference time: Exemplar GANs (ExGANs). ExGANs are a type of conditional GAN that utilize exemplar information to produce high-quality, personalized in-painting results.

By: Brian Dolhansky, Cristian Canton Ferrer
June 18, 2018

Improving Landmark Localization with Semi-Supervised Learning

Computer Vision and Pattern Recognition (CVPR)

We present two techniques to improve landmark localization in images from partially annotated datasets. Our primary goal is to leverage the common situation where precise landmark locations are only provided for a small data subset, but where class labels for classification or regression tasks related to the landmarks are more abundantly available.

By: Sina Honari, Pavlo Molchanov, Stephen Tyree, Pascal Vincent, Christopher Pal, Jan Kautz
June 18, 2018

Multimodal Explanations: Justifying Decisions and Pointing to the Evidence

Computer Vision and Pattern Recognition (CVPR)

Deep models that are both effective and explainable are desirable in many settings; prior explainable models have been unimodal, offering either image-based visualization of attention weights or text-based generation of post-hoc justifications. We propose a multimodal approach to explanation, and argue that the two modalities provide complementary explanatory strengths.

By: Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, Marcus Rohrbach
June 18, 2018

Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

Computer Vision and Pattern Recognition (CVPR)

A number of studies have found that today’s Visual Question Answering (VQA) models are heavily driven by superficial correlations in the training data and lack sufficient image grounding. To encourage development of models geared towards the latter, we propose a new setting for VQA where for every question type, train and test sets have different prior distributions of answers.

By: Aishwarya Agrawal, Dhruv Batra, Devi Parikh, Aniruddha Kembhavi
June 18, 2018

Detail-Preserving Pooling in Deep Networks

Computer Vision and Pattern Recognition (CVPR)

In this paper, we aim to leverage recent results on image downscaling for the purposes of deep learning.

By: Faraz Saeedan, Nicolas Weber, Michael Goesele, Stefan Roth
June 18, 2018

Deep Spatio-Temporal Random Fields for Efficient Video Segmentation

Computer Vision and Pattern Recognition (CVPR)

In this work we introduce a time- and memory-efficient method for structured prediction that couples neuron decisions across both space at time. We show that we are able to perform exact and efficient inference on a densely connected spatio-temporal graph by capitalizing on recent advances on deep Gaussian random fields.

By: Siddhartha Chandra, Camille Couprie, Iasonas Kokkinos
June 18, 2018

Canonical Tensor Decomposition for Knowledge Base Completion

International Conference on Machine Learning (ICML)

The problem of Knowledge Base Completion can be framed as a 3rd-order binary tensor completion problem. In this light, the Canonical Tensor Decomposition (CP) (Hitchcock, 1927) seems like a natural solution. However, current implementations of CP on standard Knowledge Base Completion benchmarks are lagging behind their competitors. In this work, we attempt to understand the limits of CP for knowledge base completion.

By: Timothée Lacroix, Nicolas Usunier, Guillaume Obozinski
June 18, 2018

Separating Self-Expression and Visual Content in Hashtag Supervision

Computer Vision and Pattern Recognition (CVPR)

This paper presents an approach that extends upon modeling simple image-label pairs with a joint model of images, hashtags, and users. We demonstrate the efficacy of such approaches in image tagging and retrieval experiments, and show how the joint model can be used to perform user-conditional retrieval and tagging.

By: Andreas Veit, Maximilian Nickel, Serge Belongie, Laurens van der Maaten