Research Area
Year Published

734 Results

January 1, 2020

Designing Safe Spaces for Virtual Reality

Ethics in Design and Communication

Virtual Reality (VR) designers accept the ethical responsibilities of removing a user’s entire world and superseding it with a fabricated reality. These unique immersive design challenges are intensified when virtual experiences become public and socially-driven. As female VR designers in 2018, we see an opportunity to fold the language of consent into the design practice of virtual reality—as a means to design safe, accessible, virtual spaces.

Publication will be made available in 2020.

By: Michelle Cortese, Andrea Zeller

June 18, 2019

Embodied Question Answering in Photorealistic Environments with Point Cloud Perception

Conference Computer Vision and Pattern Recognition (CVPR)

To help bridge the gap between internet vision-style problems and the goal of vision for embodied perception we instantiate a large-scale navigation task – Embodied Question Answering [1] in photo-realistic environments (Matterport 3D).

By: Erik Wijmans, Samyak Datta, Oleksandr Maksymets, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Devi Parikh, Dhruv Batra

June 17, 2019

DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition

Conference Computer Vision and Pattern Recognition (CVPR)

Motion has shown to be useful for video understanding, where motion is typically represented by optical flow. However, computing flow from video frames is very time-consuming. Recent works directly leverage the motion vectors and residuals readily available in the compressed video to represent motion at no cost. While this avoids flow computation, it also hurts accuracy since the motion vector is noisy and has substantially reduced resolution, which makes it a less discriminative motion representation.

By: Zheng Shou, Xudong Lin, Yannis Kalantidis, Laura Sevilla-Lara, Marcus Rohrbach, Shih-Fu Chang, Zhicheng Yan

June 17, 2019

Graph-Based Global Reasoning Networks

Conference Computer Vision and Pattern Recognition (CVPR)

Globally modeling and reasoning over relations between regions can be beneficial for many computer vision tasks on both images and videos. Convolutional Neural Networks (CNNs) excel at modeling local relations by convolution operations, but they are typically inefficient at capturing global relations between distant regions and require stacking multiple convolution layers. In this work, we propose a new approach for reasoning globally in which a set of features are globally aggregated over the coordinate space and then projected to an interaction space where relational reasoning can be efficiently computed.

By: Yunpeng Chen, Marcus Rohrbach, Zhicheng Yan, Shuicheng Yan, Jiashi Feng, Yannis Kalantidis

June 17, 2019

Panoptic Feature Pyramid Networks

Conference Computer Vision and Pattern Recognition (CVPR)

In this work, we perform a detailed study of this minimally extended version of Mask R-CNN with FPN, which we refer to as Panoptic FPN, and show it is a robust and accurate baseline for both tasks. Given its effectiveness and conceptual simplicity, we hope our method can serve as a strong baseline and aid future research in panoptic segmentation.

By: Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollar

June 17, 2019

Panoptic Segmentation

Conference Computer Vision and Pattern Recognition (CVPR)

We propose and study a task we name panoptic segmentation (PS). Panoptic segmentation unifies the typically distinct tasks of semantic segmentation (assign a class label to each pixel) and instance segmentation (detect and segment each object instance). The proposed task requires generating a coherent scene segmentation that is rich and complete, an important step toward real-world vision systems.

By: Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, Piotr Dollar

June 16, 2019

3D human pose estimation in video with temporal convolutions and semi-supervised training

Conference Computer Vision and Pattern Recognition (CVPR)

In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints.

By: Dario Pavllo, Christoph Feichtenhofer, David Grangier, Michael Auli

June 16, 2019

Towards VQA Models That Can Read

Conference Computer Vision and Pattern Recognition (CVPR)

Studies have shown that a dominant class of questions asked by visually impaired users on images of their surroundings involves reading text in the image. But today’s VQA models can not read! Our paper takes a first step towards addressing this problem.

By: Amanpreet Singh, Vivek Natarajan, Meet Shah, Yu Jiang, Xinlei Chen, Dhruv Batra, Devi Parikh, Marcus Rohrbach

June 16, 2019

Leveraging the Present to Anticipate the Future in Videos

CVPR Precognition Workshop

Anticipating actions before they are executed is crucial for a wide range of practical applications including autonomous driving and robotics. While most prior work in this area requires partial observation of executed actions, in the paper we focus on anticipating actions seconds before they start. Our proposed approach is the fusion of a purely anticipatory model with a complementary model constrained to reason about the present.

By: Antoine Miech, Ivan Laptev, Josef Sivic, Heng Wang, Lorenzo Torresani, Du Tran

June 16, 2019

Kernel Transformer Networks for Compact Spherical Convolution

Conference Computer Vision and Pattern Recognition (CVPR)

Ideally, 360◦ imagery could inherit the deep convolutional neural networks (CNNs) already trained with great success on perspective projection images. However, existing methods to transfer CNNs from perspective to spherical images introduce significant computational costs and/or degradations in accuracy. We present the Kernel Transformer Network (KTN) to efficiently transfer convolution kernels from perspective images to the equirectangular projection of 360◦ images.

By: Yu-Chuan Su, Kristen Grauman