Research Area
Year Published

104 Results

May 4, 2019

Quasi-Hyperbolic Momentum and Adam for Deep Learning

International Conference on Learning Representations (ICLR)

Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning. We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step. We describe numerous connections to and identities with other algorithms, and we characterize the set of two-state optimization algorithms that QHM can recover.

By: Jerry Ma, Denis Yarats

January 30, 2019

Large-Scale Visual Relationship Understanding

AAAI Conference on Artificial Intelligence (AAAI)

Large scale visual understanding is challenging, as it requires a model to handle the widely-spread and imbalanced distribution of triples. In real-world scenarios with large numbers of objects and relations, some are seen very commonly while others are barely seen. We develop a new relationship detection model that embeds objects and relations into two vector spaces where both discriminative capability and semantic affinity are preserved.

By: Ji Zhang, Yannis Kalantidis, Marcus Rohrbach, Manohar Paluri, Ahmed Elgammal, Mohamed Elhoseiny

December 8, 2018

From Satellite Imagery to Disaster Insights

AI for Social Good Workshop at NeurIPS 2018

The use of satellite imagery has become increasingly popular for disaster monitoring and response. After a disaster, it is important to prioritize rescue operations, disaster response and coordinate relief efforts. These have to be carried out in a fast and efficient manner since resources are often limited in disaster affected areas and it’s extremely important to identify the areas of maximum damage. However, most of the existing disaster mapping efforts are manual which is time-consuming and often leads to erroneous results.

By: Jigar Doshi, Saikat Basu, Guan Pang

December 4, 2018

A2-Nets: Double Attention Networks

Neural Information Processing Systems (NeurIPS)

Learning to capture long-range relations is fundamental to image/video recognition. Existing CNN models generally rely on increasing depth to model such relations which is highly inefficient. In this work, we propose the “double attention block”, a novel component that aggregates and propagates informative global features from the entire spatio-temporal space of input images/videos, enabling subsequent convolution layers to access features from the entire space efficiently.

By: Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng

November 27, 2018

Deep Incremental Learning for Efficient High-Fidelity Face Tracking


In this paper, we present an incremental learning framework for efficient and accurate facial performance tracking. Our approach is to alternate the modeling step, which takes tracked meshes and texture maps to train our deep learning-based statistical model, and the tracking step, which takes predictions of geometry and texture our model infers from measured images and optimize the predicted geometry by minimizing image, geometry and facial landmark errors.

By: Chenglei Wu, Takaaki Shiratori, Yaser Sheikh

November 2, 2018

Do explanations make VQA models more predictable to a human?

Empirical Methods in Natural Language Processing (EMNLP)

A rich line of research attempts to make deep neural networks more transparent by generating human-interpretable ‘explanations’ of their decision process, especially for interactive tasks like Visual Question Answering (VQA). In this work, we analyze if existing explanations indeed make a VQA model – its responses as well as failures – more predictable to a human.

By: Arjun Chandrasekaran, Viraj Prabhu, Deshraj Yadav, Prithvijit Chattopadhyay, Devi Parikh

October 31, 2018

A Dataset for Telling the Stories of Social Media Videos

Empirical Methods in Natural Language Processing (EMNLP)

Video content on social media platforms constitutes a major part of the communication between people, as it allows everyone to share their stories. However, if someone is unable to consume video, either due to a disability or network bandwidth, this severely limits their participation and communication.

By: Spandana Gella, Mike Lewis, Marcus Rohrbach

October 29, 2018

Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition

Conference on Robot Learning (CoRL)

In an open-world setting, it is inevitable that an intelligent agent (e.g., a robot) will encounter visual objects, attributes or relationships it does not recognize. In this work, we develop an agent empowered with visual curiosity, i.e. the ability to ask questions to an Oracle (e.g., human) about the contents in images (e.g., ‘What is the object on the left side of the red cube?’) and build visual recognition model based on the answers received (e.g., ‘Cylinder’).

By: Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, Devi Parikh

September 10, 2018

Predicting Future Instance Segmentation by Forecasting Convolutional Features

European Conference on Computer Vision (ECCV)

Anticipating future events is an important prerequisite towards intelligent behavior. Video forecasting has been studied as a proxy task towards this goal. Recent work has shown that to predict semantic segmentation of future frames, forecasting at the semantic level is more effective than forecasting RGB frames and then segmenting these. In this paper we consider the more challenging problem of future instance segmentation, which additionally segments out individual objects.

By: Pauline Luc, Camille Couprie, Yann LeCun, Jakob Verbeek

September 10, 2018

Value-aware Quantization for Training and Inference of Neural Networks

European Conference on Computer Vision (ECCV)

We propose a novel value-aware quantization which applies aggressively reduced precision to the majority of data while separately handling a small amount of large values in high precision, which reduces total quantization errors under very low precision.

By: Eunhyeok Park, Sungjoo Yoo, Peter Vajda