Research Area
Year Published

980 Results

October 27, 2019

Video Classification with Channel-Separated Convolutional Networks

International Conference on Computer Vision (ICCV)

This paper studies the effects of different design choices in 3D group convolutional networks for video classification. We empirically demonstrate that the amount of channel interactions plays an important role in the accuracy of 3D group convolutional networks.

By: Du Tran, Heng Wang, Lorenzo Torresani, Matt Feiszli

October 27, 2019

NoCaps: Novel object captioning at scale

International Conference on Computer Vision (ICCV)

Image captioning models have achieved impressive results on datasets containing limited visual concepts and large amounts of paired image-caption training data. However, if these models are to ever function in the wild, a much larger variety of visual concepts must be learned, ideally from less supervision. To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task.

By: Harsh Agrawal, Karan Desai, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson

October 27, 2019

Compositional Video Prediction

International Conference on Computer Vision (ICCV)

We present an approach for pixel-level future prediction given an input image of a scene. We observe that a scene is comprised of distinct entities that undergo motion and present an approach that operationalizes this insight. We implicitly predict future states of independent entities while reasoning about their interactions, and compose future video frames using these predicted states.

By: Yufei Ye, Maneesh Singh, Abhinav Gupta, Shubham Tulsiani

October 27, 2019

Live Face De-Identification in Video

International Conference on Computer Vision (ICCV)

We propose a method for face de-identification that enables fully automatic video modification at high frame rates. The goal is to maximally decorrelate the identity, while having the perception (pose, illumination and expression) fixed.

By: Oran Gafni, Lior Wolf, Yaniv Taigman

October 27, 2019

Habitat: A Platform for Embodied AI Research

International Conference on Computer Vision (ICCV)

We present Habitat, a platform for research in embodied artificial intelligence (AI). Habitat enables training embodied agents (virtual robots) in highly efficient photorealistic 3D simulation.

By: Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, Dhruv Batra

October 27, 2019

IMP: Instance Mask Projection for High Accuracy Semantic Segmentation of Things

International Conference on Computer Vision (ICCV)

In this work, we present a new operator, called Instance Mask Projection (IMP), which projects a predicted Instance Segmentation as a new feature for semantic segmentation. It also supports back propagation so is trainable end-to-end. By adding this operator, we introduce a new paradigm which combines top-down and bottom-up information in semantic segmentation.

By: Cheng-Yang Fu, Tamara Berg, Alexander C. Berg

October 27, 2019

DistInit: Learning Video Representations Without a Single Labeled Video

International Conference on Computer Vision (ICCV)

Video recognition models have progressed significantly over the past few years, evolving from shallow classifiers trained on hand-crafted features to deep spatiotemporal networks. However, labeled video data required to train such models has not been able to keep up with the ever increasing depth and sophistication of these networks. In this work we propose an alternative approach to learning video representations that requires no semantically labeled videos, and instead leverages the years of effort in collecting and labeling large and clean still-image datasets.

By: Rohit Girdhar, Du Tran, Lorenzo Torresani, Deva Ramanan

October 27, 2019

Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection

International Conference on Computer Vision (ICCV)

Learning to localize and name object instances is a fundamental problem in vision, but state-of-the-art approaches rely on expensive bounding box supervision. While weakly supervised detection (WSOD) methods relax the need for boxes to that of image-level annotations, even cheaper supervision is naturally available in the form of unstructured textual descriptions that users may freely provide when uploading image content. However, straightforward approaches to using such data for WSOD wastefully discard captions that do not exactly match object names.

By: Keren Ye, Mingda Zhang, Adriana Kovashka, Wei Li, Danfeng Qin, Jesse Berent

October 27, 2019

Improved Conditional VRNNs for Video Prediction

International Conference on Computer Vision (ICCV)

Predicting future frames for a video sequence is a challenging generative modeling task. Promising approaches include probabilistic latent variable models such as the Variational Auto-Encoder. While VAEs can handle uncertainty and model multiple possible future outcomes, they have a tendency to produce blurry predictions. In this work we argue that this is a sign of underfitting.

By: Lluís Castrejón, Nicolas Ballas, Aaron Courville

October 27, 2019

Transferability and Hardness of Supervised Classification Tasks

International Conference on Computer Vision (ICCV)

We propose a novel approach for estimating the difficulty and transferability of supervised classification tasks. Unlike previous work, our approach is solution agnostic and does not require or assume trained models. Instead, we estimate these values using an information theoretic approach: treating training labels as random variables and exploring their statistics.

By: Anh T. Tran, Cuong V. Nguyen, Tal Hassner