Filter by Research Area
Filter by Research Area
Year Published

550 Results

June 18, 2018

Learning by Asking Questions

Computer Vision and Pattern Recognition (CVPR)

We introduce an interactive learning framework for the development and testing of intelligent visual systems, called learning-by-asking (LBA). We explore LBA in context of the Visual Question Answering (VQA) task.

By: Ishan Misra, Ross Girshick, Rob Fergus, Martial Hebert, Abhinav Gupta, Laurens van der Maaten
June 18, 2018

Learning to Segment Every Thing

Computer Vision and Pattern Recognition (CVPR)

The goal of this paper is to propose a new partially supervised training paradigm, together with a novel weight transfer function, that enables training instance segmentation models on a large set of categories all of which have box annotations, but only a small fraction of which have mask annotations.

By: Ronghang Hu, Piotr Dollar, Kaiming He, Trevor Darrell, Ross Girshick
June 18, 2018

Stacked Latent Attention for Multimodal Reasoning

Computer Vision and Pattern Recognition (CVPR)

Attention has shown to be a pivotal development in deep learning and has been used for a multitude of multimodal learning tasks such as visual question answering and image captioning. In this work, we pinpoint the potential limitations to the design of a traditional attention model.

By: Haoqi Fan, Jiatong Zhou
June 18, 2018

3D Semantic Segmentation with Submanifold Sparse Convolutional Networks

Computer Vision and Pattern Recognition (CVPR)

We introduce new sparse convolutional operations that are designed to process spatially-sparse data more efficiently, and use them to develop spatially-sparse convolutional networks.

By: Benjamin Graham, Laurens van der Maaten, Martin Engelcke
June 18, 2018

Deep Spatio-Temporal Random Fields for Efficient Video Segmentation

Computer Vision and Pattern Recognition (CVPR)

In this work we introduce a time- and memory-efficient method for structured prediction that couples neuron decisions across both space at time. We show that we are able to perform exact and efficient inference on a densely connected spatio-temporal graph by capitalizing on recent advances on deep Gaussian random fields.

By: Siddhartha Chandra, Camille Couprie, Iasonas Kokkinos
June 18, 2018

Modeling Facial Geometry using Compositional VAEs

Computer Vision and Pattern Recognition (CVPR)

We propose a method for learning non-linear face geometry representations using deep generative models. Our model is a variational autoencoder with multiple levels of hidden variables where lower layers capture global geometry and higher ones encode more local deformations.

By: Timur Bagautdinov, Chenglei Wu, Jason Saragih, Pascal Fua, Yaser Sheikh
June 18, 2018

CondenseNet: An Efficient DenseNet using Learned Group Convolutions

Computer Vision and Pattern Recognition (CVPR)

In this paper we develop CondenseNet, a novel network architecture with unprecedented efficiency. It combines dense connectivity with a novel module called learned group convolution. 

By: Gao Huang, Shichen Liu, Laurens van der Maaten, Kilian Q. Weinberger
June 18, 2018

Detect-and-Track: Efficient Pose Estimation in Videos

Computer Vision and Pattern Recognition (CVPR)

This paper addresses the problem of estimating and tracking human body keypoints in complex, multi-person video. We propose an extremely lightweight yet highly effective approach that builds upon the latest advancements in human detection [17] and video understanding [5].

By: Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, Du Tran
June 18, 2018

Eye In-Painting with Exemplar Generative Adversarial Networks

Computer Vision and Pattern Recognition (CVPR)

This paper introduces a novel approach to in-painting where the identity of the object to remove or change is preserved and accounted for at inference time: Exemplar GANs (ExGANs). ExGANs are a type of conditional GAN that utilize exemplar information to produce high-quality, personalized in-painting results.

By: Brian Dolhansky, Cristian Canton Ferrer
June 18, 2018

Audio to Body Dynamics

Computer Vision and Pattern Recognition (CVPR)

We present a method that gets as input an audio of violin or piano playing, and outputs a video of skeleton predictions which are further used to animate an avatar. The key idea is to create an animation of an avatar that moves their hands similarly to how a pianist or violinist would do, just from audio.

By: Eli Shlizerman, Lucio Dery, Hayden Schoen, Ira Kemelmacher Shlizerman