Research Area
Year Published

161 Results

October 31, 2018

A Dataset for Telling the Stories of Social Media Videos

Empirical Methods in Natural Language Processing (EMNLP)

Video content on social media platforms constitutes a major part of the communication between people, as it allows everyone to share their stories. However, if someone is unable to consume video, either due to a disability or network bandwidth, this severely limits their participation and communication.

By: Spandana Gella, Mike Lewis, Marcus Rohrbach

October 29, 2018

Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition

Conference on Robot Learning (CoRL)

In an open-world setting, it is inevitable that an intelligent agent (e.g., a robot) will encounter visual objects, attributes or relationships it does not recognize. In this work, we develop an agent empowered with visual curiosity, i.e. the ability to ask questions to an Oracle (e.g., human) about the contents in images (e.g., ‘What is the object on the left side of the red cube?’) and build visual recognition model based on the answers received (e.g., ‘Cylinder’).

By: Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, Devi Parikh

September 14, 2018

Visual Coreference Resolution in Visual Dialog using Neural Module Networks

European Conference on Computer Vision (ECCV)

In this work, we propose a neural module network architecture for visual dialog by introducing two novel modules—Refer and Exclude—that perform explicit, grounded, coreference resolution at a finer word level.

By: Satwik Kottur, José M.F. Moura, Devi Parikh, Dhruv Batra, Marcus Rohrbach

September 10, 2018

Predicting Future Instance Segmentation by Forecasting Convolutional Features

European Conference on Computer Vision (ECCV)

Anticipating future events is an important prerequisite towards intelligent behavior. Video forecasting has been studied as a proxy task towards this goal. Recent work has shown that to predict semantic segmentation of future frames, forecasting at the semantic level is more effective than forecasting RGB frames and then segmenting these. In this paper we consider the more challenging problem of future instance segmentation, which additionally segments out individual objects.

By: Pauline Luc, Camille Couprie, Yann LeCun, Jakob Verbeek

September 10, 2018

Value-aware Quantization for Training and Inference of Neural Networks

European Conference on Computer Vision (ECCV)

We propose a novel value-aware quantization which applies aggressively reduced precision to the majority of data while separately handling a small amount of large values in high precision, which reduces total quantization errors under very low precision.

By: Eunhyeok Park, Sungjoo Yoo, Peter Vajda

September 10, 2018

Dense Pose Transfer

European Conference on Computer Vision (ECCV)

In this work we integrate ideas from surface-based modeling with neural synthesis: we propose a combination of surface-based pose estimation and deep generative models that allows us to perform accurate pose transfer, i.e. synthesize a new image of a person based on a single image of that person and the image of a pose donor.

By: Natalia Neverova, Riza Alp Guler, Iasonas Kokkinos

September 9, 2018

Graph R-CNN for Scene Graph Generation

European Conference on Computer Vision (ECCV)

We propose a novel scene graph generation model called Graph R-CNN, that is both effective and efficient at detecting objects and their relations in images.

By: Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, Devi Parikh

September 9, 2018

DDRNet: Depth Map Denoising and Refinement for Consumer Depth Cameras Using Cascaded CNNs

European Conference on Computer Vision (ECCV)

Although plenty of progresses have been made to reduce the noises and boost geometric details, due to the inherent illness and the real-time requirement, the problem is still far from been solved. We propose a cascaded Depth Denoising and Refinement Network (DDRNet) to tackle this problem by leveraging the multi-frame fused geometry and the accompanying high quality color image through a joint training strategy.

By: Shi Yan, Chenglei Wu, Lizhen Wang, Feng Xu, Liang An, Kaiwen Guo, Yebin Liu

September 9, 2018

Multi-Fiber Networks for Video Recognition

European Conference on Computer Vision (ECCV)

In this paper, we aim to reduce the computational cost of spatio-temporal deep neural networks, making them run as fast as their 2D counterparts while preserving state-of-the-art accuracy on video recognition benchmarks.

By: Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng

September 9, 2018

Deep Clustering for Unsupervised Learning of Visual Features

European Conference on Computer Vision (ECCV)

In this work, we present DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features.

By: Mathilde Caron, Piotr Bojanowski, Armand Joulin, Matthijs Douze