All Research Areas
Research Areas
Year Published

129 Results

October 22, 2017

Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

International Conference on Computer Vision (ICCV)

We introduce the first goal-driven training for visual question answering and dialog agents. Specifically, we pose a cooperative ‘image guessing’ game between two agents – Q-BOT and A-BOT– who communicate in natural language dialog so that Q-BOT can select an unseen image from a lineup of images.

By: Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee, Dhruv Batra
October 22, 2017

Transitive Invariance for Self-supervised Visual Representation Learning

International Conference on Computer Vision (ICCV)

In this paper, we propose to exploit different self-supervised approaches to learn representations invariant to (i) inter-instance variations (two objects in the same class should have similar features) and (ii) intra-instance variations (viewpoint, pose, deformations, illumination, etc.).

By: Xiaolong Wang, Kaiming He, Abhinav Gupta
October 22, 2017

Predicting Deeper into the Future of Semantic Segmentation

International Conference on Computer Vision (ICCV)

The ability to predict and therefore to anticipate the future is an important attribute of intelligence. We introduce the novel task of predicting semantic segmentations of future frames. Given a sequence of video frames, our goal is to predict segmentation maps of not yet observed video frames that lie up to a second or further in the future.

By: Pauline Luc, Natalia Neverova, Camille Couprie, Jakob Verbeek, Yann LeCun
October 22, 2017

Mask R-CNN

International Conference on Computer Vision (ICCV)

We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.

By: Kaiming He, Georgia Gkioxari, Piotr Dollar, Ross Girshick
October 22, 2017

Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training

International Conference on Computer Vision (ICCV)

While strong progress has been made in image captioning recently, machine and human captions are still quite distinct. To address the challenges in this area, we change the training objective of the caption generator from reproducing ground-truth captions to generating a set of captions that is indistinguishable from human written captions.

By: Rakshith Shetty, Marcus Rohrbach, Lisa Anne Hendricks, Mario Fritz, Bernt Schiele
October 22, 2017

Unsupervised Creation of Parameterized Avatars

International Conference on Computer Vision (ICCV)

We study the problem of mapping an input image to a tied pair consisting of a vector of parameters and an image that is created using a graphical engine from the vector of parameters. The mapping’s objective is to have the output image as similar as possible to the input image.

By: Lior Wolf, Yaniv Taigman, Adam Polyak
October 22, 2017

Learning to Reason: End-to-End Module Networks for Visual Question Answering

International Conference on Computer Vision (ICCV)

In this paper, we propose End-to-End Module Networks (N2NMNs), which learn to reason by directly predicting instance-specific network layouts without the aid of a parser.

By: Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Kate Saenko
October 22, 2017

Segmentation-Aware Convolutional Networks using Local Attention Masks

International Conference on Computer Vision (ICCV)

We introduce an approach to integrate segmentation information within a convolutional neural network (CNN). This counter-acts the tendency of CNNs to smooth information across regions and increases their spatial precision.

By: Adam W. Harley, Konstantinos G. Derpanis, Iasonas Kokkinos
October 22, 2017

Focal Loss for Dense Object Detection

International Conference on Computer Vision (ICCV)

In this paper, we investigate why one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. We design and train a simple dense detector we call RetinaNet. Our results show that when trained with the focal loss, RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors.

By: Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollar
October 22, 2017

Low-shot Visual Recognition by Shrinking and Hallucinating Features

International Conference on Computer Vision (ICCV)

Low-shot visual learning—the ability to recognize novel object categories from very few examples—is a hallmark of human visual intelligence. Existing machine learning approaches fail to generalize in the same way. We present a lowshot learning benchmark on complex images that mimics challenges faced by recognition systems in the wild.

By: Bharath Hariharan, Ross Girshick