Research Area
Year Published

147 Results

June 16, 2019

Leveraging the Present to Anticipate the Future in Videos

CVPR Precognition Workshop

Anticipating actions before they are executed is crucial for a wide range of practical applications including autonomous driving and robotics. While most prior work in this area requires partial observation of executed actions, in the paper we focus on anticipating actions seconds before they start. Our proposed approach is the fusion of a purely anticipatory model with a complementary model constrained to reason about the present.

By: Antoine Miech, Ivan Laptev, Josef Sivic, Heng Wang, Lorenzo Torresani, Du Tran

June 16, 2019

Kernel Transformer Networks for Compact Spherical Convolution

Conference Computer Vision and Pattern Recognition (CVPR)

Ideally, 360◦ imagery could inherit the deep convolutional neural networks (CNNs) already trained with great success on perspective projection images. However, existing methods to transfer CNNs from perspective to spherical images introduce significant computational costs and/or degradations in accuracy. We present the Kernel Transformer Network (KTN) to efficiently transfer convolution kernels from perspective images to the equirectangular projection of 360◦ images.

By: Yu-Chuan Su, Kristen Grauman

June 16, 2019

Building High Resolution Maps for Humanitarian Aid and Development with Weakly- and Semi-Supervised Learning

Computer Vision for Global Challenges Workshop at CVPR

Detailed maps help governments and NGOs plan infrastructure development and mobilize relief around the world. Mapping is an open-ended task with a seemingly endless number of potentially useful features to be mapped. In this work, we focus on mapping buildings and roads. We do so with techniques that could easily extend to other features such as land use and land classification. We discuss real-world use cases of our maps by NGOs and humanitarian organizations around the world—from sustainable infrastructure planning to disaster relief.

By: Derrick Bonafilia, David Yang, James Gill, Saikat Basu

June 15, 2019

Feature Denoising for Improving Adversarial Robustness

Conference on Computer Vision and Pattern Recognition (CVPR)

Adversarial attacks to image classification systems present challenges to convolutional networks and opportunities for understanding them. This study suggests that adversarial perturbations on images lead to noise in the features constructed by these networks.

By: Kaiming He, Yuxin Wu, Laurens van der Maaten, Alan Yuille, Cihang Xie

June 15, 2019

FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search

Conference Computer Vision and Pattern Recognition (CVPR)

Designing accurate and efficient ConvNets for mobile devices is challenging because the design space is combinatorially large. Due to this, previous neural architecture search (NAS) methods are computationally expensive. ConvNet architecture optimality depends on factors such as input resolution and target devices. However, existing approaches are too resource demanding for case-by-case redesigns. Also, previous work focuses primarily on reducing FLOPs, but FLOP count does not always reflect actual latency.

By: Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, Kurt Keutzer

June 14, 2019

Thinking Outside the Pool: Active Training Image Creation for Relative Attributes

Conference on Computer Vision and Pattern Recognition (CVPR)

Current wisdom suggests more labeled image data is always better, and obtaining labels is the bottleneck. Yet curating a pool of sufficiently diverse and informative images is itself a challenge. In particular, training image curation is problematic for fine-grained attributes, where the subtle visual differences of interest may be rare within traditional image sources. We propose an active image generation approach to address this issue.

By: Aron Yu, Kristen Grauman

June 14, 2019

2.5D Visual Sound

Conference Computer Vision and Pattern Recognition (CVPR)

Binaural audio provides a listener with 3D sound sensation, allowing a rich perceptual experience of the scene. However, binaural recordings are scarcely available and require nontrivial expertise and equipment to obtain. We propose to convert common monaural audio into binaural audio by leveraging video.

By: Ruohan Gao, Kristen Grauman

June 13, 2019

Multi-modal Content Localization in Videos Using Weak Supervision

International Conference on Machine Learning (ICML)

Identifying the temporal segments in a video that contain content relevant to a category or task is a difficult but interesting problem. This has applications in fine-grained video indexing and retrieval. Part of the difficulty in this problem comes from the lack of supervision since large-scale annotation of localized segments containing the content of interest is very expensive. In this paper, we propose to use the category assigned to an entire video as weak supervision to our model.

By: Gourab Kundu, Prahal Arora, Ferdi Adeputra, Polina Kuznetsova, Daniel McKinnon, Michelle Cheung, Larry Anazia, Geoffrey Zweig

June 11, 2019

Adversarial Inference for Multi-Sentence Video Description

Conference Computer Vision and Pattern Recognition (CVPR)

While significant progress has been made in the image captioning task, video description is still in its infancy due to the complex nature of video data. Generating multi-sentence descriptions for long videos is even more challenging. Among the main issues are the fluency and coherence of the generated descriptions, and their relevance to the video.

By: Jae Sung Park, Marcus Rohrbach, Trevor Darrell, Anna Rohrbach

June 10, 2019

GEOMetrics: Exploiting Geometric Structure for Graph-Encoded Objects

International Conference on Machine Learning (ICML)

Mesh models are a promising approach for encoding the structure of 3D objects. Current mesh reconstruction systems predict uniformly distributed vertex locations of a predetermined graph through a series of graph convolutions, leading to compromises with respect to performance or resolution. In this paper, we argue that the graph representation of geometric objects allows for additional structure, which should be leveraged for enhanced reconstruction.

By: Edward J. Smith, Scott Fujimoto, Adriana Romero, David Meger