Research Area
Year Published

161 Results

November 10, 2019

An Integrated 6DoF Video Camera and System Design

SIGGRAPH Asia

Designing a fully integrated 360◦ video camera supporting 6DoF head motion parallax requires overcoming many technical hurdles, including camera placement, optical design, sensor resolution, system calibration, real-time video capture, depth reconstruction, and real-time novel view synthesis. While there is a large body of work describing various system components, such as multi-view depth estimation, our paper is the first to describe a complete, reproducible system that considers the challenges arising when designing, building, and deploying a full end-to-end 6DoF video camera and playback environment.

By: Albert Parra Pozo, Michael Toksvig, Terry Filiba Schrager, Joyce Hsu, Uday Mathur, Alexander Sorkine-Hornung, Richard Szeliski, Brian Cabral

October 28, 2019

Unsupervised Pre-Training of Image Features on Non-Curated Data

International Conference on Computer Vision (ICCV)

Pre-training general-purpose visual features with convolutional neural networks without relying on annotations is a challenging and important task. Most recent efforts in unsupervised feature learning have focused on either small or highly curated datasets like ImageNet, whereas using non-curated raw datasets was found to decrease the feature quality when evaluated on a transfer task. Our goal is to bridge the performance gap between unsupervised methods trained on curated data, which are costly to obtain, and massive raw datasets that are easily available.

By: Mathilde Caron, Piotr Bojanowski, Julien Mairal, Armand Joulin

October 28, 2019

Enhancing Adversarial Example Transferability with an Intermediate Level Attack

International Conference on Computer Vision (ICCV)

We introduce the Intermediate Level Attack (ILA), which attempts to fine-tune an existing adversarial example for greater black-box transferability by increasing its perturbation on a pre-specified layer of the source model, improving upon state-of-the-art methods.

By: Qian Huang, Isay Katsman, Horace He, Zeqi Gu, Serge Belongie, Ser Nam Lim

October 27, 2019

Video Classification with Channel-Separated Convolutional Networks

International Conference on Computer Vision (ICCV)

This paper studies the effects of different design choices in 3D group convolutional networks for video classification. We empirically demonstrate that the amount of channel interactions plays an important role in the accuracy of 3D group convolutional networks.

By: Du Tran, Heng Wang, Lorenzo Torresani, Matt Feiszli

October 27, 2019

Compositional Video Prediction

International Conference on Computer Vision (ICCV)

We present an approach for pixel-level future prediction given an input image of a scene. We observe that a scene is comprised of distinct entities that undergo motion and present an approach that operationalizes this insight. We implicitly predict future states of independent entities while reasoning about their interactions, and compose future video frames using these predicted states.

By: Yufei Ye, Maneesh Singh, Abhinav Gupta, Shubham Tulsiani

October 27, 2019

Canonical Surface Mapping via Geometric Cycle Consistency

International Conference on Computer Vision (ICCV)

We explore the task of Canonical Surface Mapping (CSM). Specifically, given an image, we learn to map pixels on the object to their corresponding locations on an abstract 3D model of the category.

By: Nilesh Kulkarni, Abhinav Gupta, Shubham Tulsiani

October 27, 2019

Scaling and Benchmarking Self-Supervised Visual Representation Learning

International Conference on Computer Vision (ICCV)

Self-supervised learning aims to learn representations from the data itself without explicit manual supervision. Existing efforts ignore a crucial aspect of self-supervised learning – the ability to scale to large amount of data because self-supervision requires no manual labels. In this work, we revisit this principle and scale two popular self-supervised approaches to 100 million images.

By: Priya Goyal, Dhruv Mahajan, Abhinav Gupta, Ishan Misra

October 27, 2019

DistInit: Learning Video Representations Without a Single Labeled Video

International Conference on Computer Vision (ICCV)

Video recognition models have progressed significantly over the past few years, evolving from shallow classifiers trained on hand-crafted features to deep spatiotemporal networks. However, labeled video data required to train such models has not been able to keep up with the ever increasing depth and sophistication of these networks. In this work we propose an alternative approach to learning video representations that requires no semantically labeled videos, and instead leverages the years of effort in collecting and labeling large and clean still-image datasets.

By: Rohit Girdhar, Du Tran, Lorenzo Torresani, Deva Ramanan

October 26, 2019

On Network Design Spaces for Visual Recognition

International Conference on Computer Vision (ICCV)

Over the past several years progress in designing better neural network architectures for visual recognition has been substantial. To help sustain this rate of progress, in this work we propose to reexamine the methodology for comparing network architectures. In particular, we introduce a new comparison paradigm of distribution estimates, in which network design spaces are compared by applying statistical techniques to populations of sampled models, while controlling for confounding factors like network complexity.

By: Ilija Radosavovic, Justin Johnson, Saining Xie, Wan-Yen Lo, Piotr Dollar

October 26, 2019

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

International Conference on Computer Vision (ICCV)

In natural images, information is conveyed at different frequencies where higher frequencies are usually encoded with fine details and lower frequencies are usually encoded with global structures. Similarly, the output feature maps of a convolution layer can also be seen as a mixture of information at different frequencies. In this work, we propose to factorize the mixed feature maps by their frequencies, and design a novel Octave Convolution (OctConv) operation to store and process feature maps that vary spatially “slower” at a lower spatial resolution reducing both memory and computation cost.

By: Yunpeng Chen, Haoqi Fan, Bing Xu, Zhicheng Yan, Yannis Kalantidis, Marcus Rohrbach, Shuicheng Yan, Jiashi Feng