Research Area
Year Published

164 Results

July 12, 2019

VR Facial Animation via Multiview Image Translation


In this work, we present a bidirectional system that can animate avatar heads of both users’ full likeness using consumer-friendly headset mounted cameras (HMC). There are two main challenges in doing this: unaccommodating camera views and the image-to-avatar domain gap. We address both challenges by leveraging constraints imposed by multiview geometry to establish precise image-to-avatar correspondence, which are then used to learn an end-to-end model for real-time tracking.

By: Shih-En Wei, Jason Saragih, Tomas Simon, Adam W. Harley, Stephen Lombardi, Michal Perdoch, Alexander Hypes, Dawei Wang, Hernan Badino, Yaser Sheikh

June 27, 2019

Sensor Modeling and Benchmarking — A Platform for Sensor and Computer Vision Algorithm Co-Optimization

International Image Sensor Workshop

We predict that applications in AR/VR devices [1] and intelligence devices will lead to the emergence of a new class of image sensors — machine perception CIS (MPCIS). This new class of sensors will produce images and videos optimized primarily for machine vision applications, not human consumption.

By: Andrew Berkovich, Chiao Liu

June 18, 2019

LVIS: A Dataset for Large Vocabulary Instance Segmentation

Conference Computer Vision and Pattern Recognition (CVPR)

Progress on object detection is enabled by datasets that focus the research community’s attention on open challenges. This process led us from simple images to complex scenes and from bounding boxes to segmentation masks. In this work, we introduce LVIS (pronounced ‘el-vis’): a new dataset for Large Vocabulary Instance Segmentation.

By: Agrim Gupta, Piotr Dollar, Ross Girshick

June 18, 2019

Embodied Question Answering in Photorealistic Environments with Point Cloud Perception

Conference Computer Vision and Pattern Recognition (CVPR)

To help bridge the gap between internet vision-style problems and the goal of vision for embodied perception we instantiate a large-scale navigation task – Embodied Question Answering [1] in photo-realistic environments (Matterport 3D).

By: Erik Wijmans, Samyak Datta, Oleksandr Maksymets, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Devi Parikh, Dhruv Batra

June 18, 2019

Long-Term Feature Banks for Detailed Video Understanding

Conference Computer Vision and Pattern Recognition (CVPR)

We propose a long-term feature bank—supportive information extracted over the entire span of a video—to augment state-of-the-art video models that otherwise would only view short clips of 2-5 seconds.

By: Chao-Yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krähenbühl, Ross Girshick

June 18, 2019

Grounded Video Description

Conference Computer Vision and Pattern Recognition (CVPR)

Video description is one of the most challenging problems in vision and language understanding due to the large variability both on the video and language side. Models, hence, typically shortcut the difficulty in recognition and generate plausible sentences that are based on priors but are not necessarily grounded in the video. In this work, we explicitly link the sentence to the evidence in the video by annotating each noun phrase in a sentence with the corresponding bounding box in one of the frames of a video.

By: Luowei Zhou, Yannis Kalantidis, Xinlei Chen, Jason J. Corso, Marcus Rohrbach

June 17, 2019

Panoptic Feature Pyramid Networks

Conference Computer Vision and Pattern Recognition (CVPR)

In this work, we perform a detailed study of this minimally extended version of Mask R-CNN with FPN, which we refer to as Panoptic FPN, and show it is a robust and accurate baseline for both tasks. Given its effectiveness and conceptual simplicity, we hope our method can serve as a strong baseline and aid future research in panoptic segmentation.

By: Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollar

June 17, 2019

DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition

Conference Computer Vision and Pattern Recognition (CVPR)

Motion has shown to be useful for video understanding, where motion is typically represented by optical flow. However, computing flow from video frames is very time-consuming. Recent works directly leverage the motion vectors and residuals readily available in the compressed video to represent motion at no cost. While this avoids flow computation, it also hurts accuracy since the motion vector is noisy and has substantially reduced resolution, which makes it a less discriminative motion representation.

By: Zheng Shou, Xudong Lin, Yannis Kalantidis, Laura Sevilla-Lara, Marcus Rohrbach, Shih-Fu Chang, Zhicheng Yan

June 17, 2019

Graph-Based Global Reasoning Networks

Conference Computer Vision and Pattern Recognition (CVPR)

Globally modeling and reasoning over relations between regions can be beneficial for many computer vision tasks on both images and videos. Convolutional Neural Networks (CNNs) excel at modeling local relations by convolution operations, but they are typically inefficient at capturing global relations between distant regions and require stacking multiple convolution layers. In this work, we propose a new approach for reasoning globally in which a set of features are globally aggregated over the coordinate space and then projected to an interaction space where relational reasoning can be efficiently computed.

By: Yunpeng Chen, Marcus Rohrbach, Zhicheng Yan, Shuicheng Yan, Jiashi Feng, Yannis Kalantidis

June 16, 2019

Inverse Cooking: Recipe Generation from Food Images

Conference Computer Vision and Pattern Recognition (CVPR)

People enjoy food photography because they appreciate food. Behind each meal there is a story described in a complex recipe and, unfortunately, by simply looking at a food image we do not have access to its preparation process. Therefore, in this paper we introduce an inverse cooking system that recreates cooking recipes given food images.

By: Amaia Salvador, Michal Drozdzal, Xavier Giro-i-Nieto, Adriana Romero