Research Area
Year Published

101 Results

December 8, 2018

From Satellite Imagery to Disaster Insights

AI for Social Good Workshop at NeurIPS 2018

The use of satellite imagery has become increasingly popular for disaster monitoring and response. After a disaster, it is important to prioritize rescue operations, disaster response and coordinate relief efforts. These have to be carried out in a fast and efficient manner since resources are often limited in disaster affected areas and it’s extremely important to identify the areas of maximum damage. However, most of the existing disaster mapping efforts are manual which is time-consuming and often leads to erroneous results.

By: Jigar Doshi, Saikat Basu, Guan Pang
December 4, 2018

A2-Nets: Double Attention Networks

Neural Information Processing Systems (NeurIPS)

Learning to capture long-range relations is fundamental to image/video recognition. Existing CNN models generally rely on increasing depth to model such relations which is highly inefficient. In this work, we propose the “double attention block”, a novel component that aggregates and propagates informative global features from the entire spatio-temporal space of input images/videos, enabling subsequent convolution layers to access features from the entire space efficiently.

By: Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng
November 27, 2018

Deep Incremental Learning for Efficient High-Fidelity Face Tracking

ACM SIGGRAPH ASIA 2018

In this paper, we present an incremental learning framework for efficient and accurate facial performance tracking. Our approach is to alternate the modeling step, which takes tracked meshes and texture maps to train our deep learning-based statistical model, and the tracking step, which takes predictions of geometry and texture our model infers from measured images and optimize the predicted geometry by minimizing image, geometry and facial landmark errors.

By: Chenglei Wu, Takaaki Shiratori, Yaser Sheikh
November 2, 2018

Do explanations make VQA models more predictable to a human?

Empirical Methods in Natural Language Processing (EMNLP)

A rich line of research attempts to make deep neural networks more transparent by generating human-interpretable ‘explanations’ of their decision process, especially for interactive tasks like Visual Question Answering (VQA). In this work, we analyze if existing explanations indeed make a VQA model – its responses as well as failures – more predictable to a human.

By: Arjun Chandrasekaran, Viraj Prabhu, Deshraj Yadav, Prithvijit Chattopadhyay, Devi Parikh
October 31, 2018

A Dataset for Telling the Stories of Social Media Videos

Empirical Methods in Natural Language Processing (EMNLP)

Video content on social media platforms constitutes a major part of the communication between people, as it allows everyone to share their stories. However, if someone is unable to consume video, either due to a disability or network bandwidth, this severely limits their participation and communication.

By: Spandana Gella, Mike Lewis, Marcus Rohrbach
September 10, 2018

Predicting Future Instance Segmentation by Forecasting Convolutional Features

European Conference on Computer Vision (ECCV)

Anticipating future events is an important prerequisite towards intelligent behavior. Video forecasting has been studied as a proxy task towards this goal. Recent work has shown that to predict semantic segmentation of future frames, forecasting at the semantic level is more effective than forecasting RGB frames and then segmenting these. In this paper we consider the more challenging problem of future instance segmentation, which additionally segments out individual objects.

By: Pauline Luc, Camille Couprie, Yann LeCun, Jakob Verbeek
September 10, 2018

Value-aware Quantization for Training and Inference of Neural Networks

European Conference on Computer Vision (ECCV)

We propose a novel value-aware quantization which applies aggressively reduced precision to the majority of data while separately handling a small amount of large values in high precision, which reduces total quantization errors under very low precision.

By: Eunhyeok Park, Sungjoo Yoo, Peter Vajda
September 10, 2018

Dense Pose Transfer

European Conference on Computer Vision (ECCV)

In this work we integrate ideas from surface-based modeling with neural synthesis: we propose a combination of surface-based pose estimation and deep generative models that allows us to perform accurate pose transfer, i.e. synthesize a new image of a person based on a single image of that person and the image of a pose donor.

By: Natalia Neverova, Riza Alp Guler, Iasonas Kokkinos
September 9, 2018

Graph R-CNN for Scene Graph Generation

European Conference on Computer Vision (ECCV)

We propose a novel scene graph generation model called Graph R-CNN, that is both effective and efficient at detecting objects and their relations in images.

By: Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, Devi Parikh
September 9, 2018

DDRNet: Depth Map Denoising and Refinement for Consumer Depth Cameras Using Cascaded CNNs

European Conference on Computer Vision (ECCV)

Although plenty of progresses have been made to reduce the noises and boost geometric details, due to the inherent illness and the real-time requirement, the problem is still far from been solved. We propose a cascaded Depth Denoising and Refinement Network (DDRNet) to tackle this problem by leveraging the multi-frame fused geometry and the accompanying high quality color image through a joint training strategy.

By: Shi Yan, Chenglei Wu, Lizhen Wang, Feng Xu, Liang An, Kaiwen Guo, Yebin Liu