Research Area
Year Published

122 Results

November 24, 2018

On Periodic Functions as Regularizers for Quantization of Neural Networks

ArXiv

Deep learning models have been successfully used in computer vision and many other fields. We propose an unorthodox algorithm for performing quantization of the model parameters.

By: Maxim Naumov, Utku Diril, Jongsoo Park, Benjamin Ray, Jedrzej Jablonski, Andrew Tulloch

November 2, 2018

Do explanations make VQA models more predictable to a human?

Empirical Methods in Natural Language Processing (EMNLP)

A rich line of research attempts to make deep neural networks more transparent by generating human-interpretable ‘explanations’ of their decision process, especially for interactive tasks like Visual Question Answering (VQA). In this work, we analyze if existing explanations indeed make a VQA model – its responses as well as failures – more predictable to a human.

By: Arjun Chandrasekaran, Viraj Prabhu, Deshraj Yadav, Prithvijit Chattopadhyay, Devi Parikh

October 31, 2018

A Dataset for Telling the Stories of Social Media Videos

Empirical Methods in Natural Language Processing (EMNLP)

Video content on social media platforms constitutes a major part of the communication between people, as it allows everyone to share their stories. However, if someone is unable to consume video, either due to a disability or network bandwidth, this severely limits their participation and communication.

By: Spandana Gella, Mike Lewis, Marcus Rohrbach

October 29, 2018

Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition

Conference on Robot Learning (CoRL)

In an open-world setting, it is inevitable that an intelligent agent (e.g., a robot) will encounter visual objects, attributes or relationships it does not recognize. In this work, we develop an agent empowered with visual curiosity, i.e. the ability to ask questions to an Oracle (e.g., human) about the contents in images (e.g., ‘What is the object on the left side of the red cube?’) and build visual recognition model based on the answers received (e.g., ‘Cylinder’).

By: Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, Devi Parikh

September 14, 2018

Visual Coreference Resolution in Visual Dialog using Neural Module Networks

European Conference on Computer Vision (ECCV)

In this work, we propose a neural module network architecture for visual dialog by introducing two novel modules—Refer and Exclude—that perform explicit, grounded, coreference resolution at a finer word level.

By: Satwik Kottur, José M.F. Moura, Devi Parikh, Dhruv Batra, Marcus Rohrbach

September 10, 2018

Predicting Future Instance Segmentation by Forecasting Convolutional Features

European Conference on Computer Vision (ECCV)

Anticipating future events is an important prerequisite towards intelligent behavior. Video forecasting has been studied as a proxy task towards this goal. Recent work has shown that to predict semantic segmentation of future frames, forecasting at the semantic level is more effective than forecasting RGB frames and then segmenting these. In this paper we consider the more challenging problem of future instance segmentation, which additionally segments out individual objects.

By: Pauline Luc, Camille Couprie, Yann LeCun, Jakob Verbeek

September 10, 2018

Value-aware Quantization for Training and Inference of Neural Networks

European Conference on Computer Vision (ECCV)

We propose a novel value-aware quantization which applies aggressively reduced precision to the majority of data while separately handling a small amount of large values in high precision, which reduces total quantization errors under very low precision.

By: Eunhyeok Park, Sungjoo Yoo, Peter Vajda

September 10, 2018

Dense Pose Transfer

European Conference on Computer Vision (ECCV)

In this work we integrate ideas from surface-based modeling with neural synthesis: we propose a combination of surface-based pose estimation and deep generative models that allows us to perform accurate pose transfer, i.e. synthesize a new image of a person based on a single image of that person and the image of a pose donor.

By: Natalia Neverova, Riza Alp Guler, Iasonas Kokkinos

September 9, 2018

Graph R-CNN for Scene Graph Generation

European Conference on Computer Vision (ECCV)

We propose a novel scene graph generation model called Graph R-CNN, that is both effective and efficient at detecting objects and their relations in images.

By: Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, Devi Parikh

September 9, 2018

DDRNet: Depth Map Denoising and Refinement for Consumer Depth Cameras Using Cascaded CNNs

European Conference on Computer Vision (ECCV)

Although plenty of progresses have been made to reduce the noises and boost geometric details, due to the inherent illness and the real-time requirement, the problem is still far from been solved. We propose a cascaded Depth Denoising and Refinement Network (DDRNet) to tackle this problem by leveraging the multi-frame fused geometry and the accompanying high quality color image through a joint training strategy.

By: Shi Yan, Chenglei Wu, Lizhen Wang, Feng Xu, Liang An, Kaiwen Guo, Yebin Liu