Research Area
Year Published

258 Results

October 27, 2019

Habitat: A Platform for Embodied AI Research

International Conference on Computer Vision (ICCV)

We present Habitat, a platform for research in embodied artificial intelligence (AI). Habitat enables training embodied agents (virtual robots) in highly efficient photorealistic 3D simulation.

By: Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, Dhruv Batra

October 27, 2019

SCSampler: Sampling Salient Clips from Video for Efficient Action Recognition

International Conference on Computer Vision (ICCV)

In this paper we introduce a lightweight “clip-sampling” model that can efficiently identify the most salient temporal clips within a long video. We demonstrate that the computational cost of action recognition on untrimmed videos can be dramatically reduced by invoking recognition only on these most salient clips. Furthermore, we show that this yields significant gains in recognition accuracy compared to analysis of all clips or randomly/uniformly selected clips.

By: Bruno Korbar, Du Tran, Lorenzo Torresani

October 27, 2019

DistInit: Learning Video Representations Without a Single Labeled Video

International Conference on Computer Vision (ICCV)

Video recognition models have progressed significantly over the past few years, evolving from shallow classifiers trained on hand-crafted features to deep spatiotemporal networks. However, labeled video data required to train such models has not been able to keep up with the ever increasing depth and sophistication of these networks. In this work we propose an alternative approach to learning video representations that requires no semantically labeled videos, and instead leverages the years of effort in collecting and labeling large and clean still-image datasets.

By: Rohit Girdhar, Du Tran, Lorenzo Torresani, Deva Ramanan

October 26, 2019

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

International Conference on Computer Vision (ICCV)

In natural images, information is conveyed at different frequencies where higher frequencies are usually encoded with fine details and lower frequencies are usually encoded with global structures. Similarly, the output feature maps of a convolution layer can also be seen as a mixture of information at different frequencies. In this work, we propose to factorize the mixed feature maps by their frequencies, and design a novel Octave Convolution (OctConv) operation to store and process feature maps that vary spatially “slower” at a lower spatial resolution reducing both memory and computation cost.

By: Yunpeng Chen, Haoqi Fan, Bing Xu, Zhicheng Yan, Yannis Kalantidis, Marcus Rohrbach, Shuicheng Yan, Jiashi Feng

October 20, 2019

Understanding Deep Networks via Extremal Perturbations and Smooth Masks

International Conference on Computer Vision (ICCV)

The problem of attribution is concerned with identifying the parts of an input that are responsible for a model’s output. An important family of attribution methods is based on measuring the effect of perturbations applied to the input. In this paper, we discuss some of the shortcomings of existing approaches to perturbation analysis and address them by introducing the concept of extremal perturbations, which are theoretically grounded and interpretable.

By: Ruth Fong, Mandela Patrick, Andrea Vedaldi

October 20, 2019

Occlusions for Effective Data Augmentation in Image Classification

ICCV Workshop on Interpreting and Explaining Visual AI Models

In this paper, we show that, by using a simple technique based on batch augmentation, occlusions as data augmentation can result in better performance on ImageNet for high-capacity models (e.g., ResNet50). We also show that varying amounts of occlusions used during training can be used to study the robustness of different neural network architectures.

By: Ruth Fong, Andrea Vedaldi

September 30, 2019

Neural Code Search Evaluation Dataset


There has been an increase of interest in code search using natural language. Assessing the performance of such code search models can be difficult without a readily available evaluation suite. In this paper, we present an evaluation dataset of natural language query and code snippet pairs for future work. We also provide the results of two code search models ([6] and [1]) from recent work as a benchmark.

By: Hongyu Li, Seohyun Kim, Satish Chandra

September 17, 2019

Unsupervised Singing Voice Conversion


We present a deep learning method for singing voice conversion. The proposed network is not conditioned on the text or on the notes, and it directly converts the audio of one singer to the voice of another. Training is performed without any form of supervision: no lyrics or any kind of phonetic features, no notes, and no matching samples between singers.

By: Eliya Nachmani, Lior Wolf

September 15, 2019

Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions


We propose a fully convolutional sequence-to-sequence encoder architecture with a simple and efficient decoder. Our model improves WER on LibriSpeech while being an order of magnitude more efficient than a strong RNN baseline.

By: Awni Hannun, Ann Lee, Qiantong Xu, Ronan Collobert

September 10, 2019

Bridging the Gap Between Relevance Matching and Semantic Matching for Short Text Similarity Modeling

Conference on Empirical Methods in Natural Language Processing (EMNLP)

We propose a novel model, HCAN (Hybrid Co-Attention Network), that comprises (1) a hybrid encoder module that includes ConvNet-based and LSTM-based encoders, (2) a relevance matching module that measures soft term matches with importance weighting at multiple granularities, and (3) a semantic matching module with co-attention mechanisms that capture context-aware semantic relatedness.

By: Jinfeng Rao, Linqing Liu, Yi Tay, Wei Yang, Peng Shi, Jimmy Lin