Research Area
Year Published

203 Results

October 28, 2019

Object-centric Forward Modeling for Model Predictive Control

Conference on Robot Learning (CoRL)

We present an approach to learn an object-centric forward model, and show that this allows us to plan for sequences of actions to achieve distant desired goals. We propose to model a scene as a collection of objects, each with an explicit spatial location and implicit visual feature, and learn to model the effects of actions using random interaction data.

By: Yufei Ye, Dhiraj Gandhi, Abhinav Gupta, Shubham Tulsiani

October 28, 2019

Task-Driven Modular Networks for Zero-Shot Compositional Learning

International Conference on Computer Vision (ICCV)

We focus our study on the problem of compositional zero-shot classification of object-attribute categories. We show in our experiments that current evaluation metrics are flawed as they only consider unseen object-attribute pairs.

By: Senthil Purushwalkam, Maximilian Nickel, Abhinav Gupta, Marc'Aurelio Ranzato

October 28, 2019

Unsupervised Pre-Training of Image Features on Non-Curated Data

International Conference on Computer Vision (ICCV)

Pre-training general-purpose visual features with convolutional neural networks without relying on annotations is a challenging and important task. Most recent efforts in unsupervised feature learning have focused on either small or highly curated datasets like ImageNet, whereas using non-curated raw datasets was found to decrease the feature quality when evaluated on a transfer task. Our goal is to bridge the performance gap between unsupervised methods trained on curated data, which are costly to obtain, and massive raw datasets that are easily available.

By: Mathilde Caron, Piotr Bojanowski, Julien Mairal, Armand Joulin

October 28, 2019

Enhancing Adversarial Example Transferability with an Intermediate Level Attack

International Conference on Computer Vision (ICCV)

We introduce the Intermediate Level Attack (ILA), which attempts to fine-tune an existing adversarial example for greater black-box transferability by increasing its perturbation on a pre-specified layer of the source model, improving upon state-of-the-art methods.

By: Qian Huang, Isay Katsman, Horace He, Zeqi Gu, Serge Belongie, Ser Nam Lim

October 28, 2019

DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare

International Conference on Computer Vision (ICCV)

We present DenseRaC, a novel end-to-end framework for jointly estimating 3D human pose and body shape from a monocular RGB image. Our two-step framework takes the body pixel-to-surface correspondence map (i.e., IUV map) as proxy representation and then performs estimation of parameterized human pose and shape.

By: Yuanlu Xu, Song-Chun Zhu, Tony Tung

October 28, 2019

TensorMask: A Foundation for Dense Object Segmentation

International Conference on Computer Vision (ICCV)

Sliding-window object detectors that generate bounding-box object predictions over a dense, regular grid have advanced rapidly and proven popular. In contrast, modern instance segmentation approaches are dominated by methods that first detect object bounding boxes, and then crop and segment these regions, as popularized by Mask R-CNN. In this work, we investigate the paradigm of dense sliding-window instance segmentation, which is surprisingly under-explored.

By: Xinlei Chen, Ross Girshick, Kaiming He, Piotr Dollar

October 27, 2019

Rethinking ImageNet Pre-training

International Conference on Computer Vision (ICCV)

We report competitive results on object detection and instance segmentation on the COCO dataset using standard models trained from random initialization. The results are no worse than their ImageNet pre-training counterparts even when using the hyper-parameters of the baseline system (Mask R-CNN) that were optimized for fine-tuning pre-trained models, with the sole exception of increasing the number of training iterations so the randomly initialized models may converge.

By: Kaiming He, Ross Girshick, Piotr Dollar

October 27, 2019

Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection

International Conference on Computer Vision (ICCV)

Learning to localize and name object instances is a fundamental problem in vision, but state-of-the-art approaches rely on expensive bounding box supervision. While weakly supervised detection (WSOD) methods relax the need for boxes to that of image-level annotations, even cheaper supervision is naturally available in the form of unstructured textual descriptions that users may freely provide when uploading image content. However, straightforward approaches to using such data for WSOD wastefully discard captions that do not exactly match object names.

By: Keren Ye, Mingda Zhang, Adriana Kovashka, Wei Li, Danfeng Qin, Jesse Berent

October 27, 2019

Exploring Randomly Wired Neural Networks for Image Recognition

International Conference on Computer Vision (ICCV)

Neural networks for image recognition have evolved through extensive manual design from simple chain-like models to structures with multiple wiring paths. The success of ResNets [12] and DenseNets [17] is due in large part to their innovative wiring plans. Now, neural architecture search (NAS) studies are exploring the joint optimization of wiring and operation types, however, the space of possible wirings is constrained and still driven by manual design despite being searched. In this paper, we explore a more diverse set of connectivity patterns through the lens of randomly wired neural networks.

By: Saining Xie, Alexander Kirillov, Ross Girshick, Kaiming He

October 27, 2019

Transferability and Hardness of Supervised Classification Tasks

International Conference on Computer Vision (ICCV)

We propose a novel approach for estimating the difficulty and transferability of supervised classification tasks. Unlike previous work, our approach is solution agnostic and does not require or assume trained models. Instead, we estimate these values using an information theoretic approach: treating training labels as random variables and exploring their statistics.

By: Anh T. Tran, Cuong V. Nguyen, Tal Hassner