Research Area
Year Published

161 Results

May 1, 2019

Large-scale weakly-supervised pre-training for video action recognition

Conference Computer Vision and Pattern Recognition (CVPR)

Current fully-supervised video datasets consist of only a few hundred thousand videos and fewer than a thousand domain-specific labels. This hinders the progress towards advanced video architectures. This paper presents an in-depth study of using large volumes of web videos for pre-training video models for the task of action recognition.

By: Deepti Ghadiyaram, Matt Feiszli, Du Tran, Xueting Yan, Hen Wang, Dhruv Mahajan

April 28, 2019

Inverse Path Tracing for Joint Material and Lighting Estimation

Computer Vision and Pattern Recognition (CVPR)

Modern computer vision algorithms have brought significant advancement to 3D geometry reconstruction. However, illumination and material reconstruction remain less studied, with current approaches assuming very simplified models for materials and illumination. We introduce Inverse Path Tracing, a novel approach to jointly estimate the material properties of objects and light sources in indoor scenes by using an invertible light transport simulation.

By: Dejan Azinović, Tzu-Mao Li, Anton S. Kaplanyan, Matthias Nießner

March 12, 2019

Convolutional neural networks for mesh-based parcellation of the cerebral cortex

Medical Imaging with Deep Learning (MIDL)

We show experimentally on the Human Connectome Project dataset that the proposed graph convolutional models outperform current state-of-the-art and baselines, highlighting the potential and applicability of these methods to tackle neuroimaging challenges, paving the road towards a better characterization of brain diseases.

By: Guillem Cucurull, Konrad Wagstyl, Arantxa Casanova, Petar Velickovic, Estrid Jakobsen, Michal Drozdzal, Adriana Romero, Alan Evans, Yoshua Bengio

January 30, 2019

Large-Scale Visual Relationship Understanding

AAAI Conference on Artificial Intelligence (AAAI)

Large scale visual understanding is challenging, as it requires a model to handle the widely-spread and imbalanced distribution of triples. In real-world scenarios with large numbers of objects and relations, some are seen very commonly while others are barely seen. We develop a new relationship detection model that embeds objects and relations into two vector spaces where both discriminative capability and semantic affinity are preserved.

By: Ji Zhang, Yannis Kalantidis, Marcus Rohrbach, Manohar Paluri, Ahmed Elgammal, Mohamed Elhoseiny

December 8, 2018

From Satellite Imagery to Disaster Insights

AI for Social Good Workshop at NeurIPS 2018

The use of satellite imagery has become increasingly popular for disaster monitoring and response. After a disaster, it is important to prioritize rescue operations, disaster response and coordinate relief efforts. These have to be carried out in a fast and efficient manner since resources are often limited in disaster affected areas and it’s extremely important to identify the areas of maximum damage. However, most of the existing disaster mapping efforts are manual which is time-consuming and often leads to erroneous results.

By: Jigar Doshi, Saikat Basu, Guan Pang

December 4, 2018

A2-Nets: Double Attention Networks

Neural Information Processing Systems (NeurIPS)

Learning to capture long-range relations is fundamental to image/video recognition. Existing CNN models generally rely on increasing depth to model such relations which is highly inefficient. In this work, we propose the “double attention block”, a novel component that aggregates and propagates informative global features from the entire spatio-temporal space of input images/videos, enabling subsequent convolution layers to access features from the entire space efficiently.

By: Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng

November 27, 2018

Deep Incremental Learning for Efficient High-Fidelity Face Tracking

ACM SIGGRAPH ASIA 2018

In this paper, we present an incremental learning framework for efficient and accurate facial performance tracking. Our approach is to alternate the modeling step, which takes tracked meshes and texture maps to train our deep learning-based statistical model, and the tracking step, which takes predictions of geometry and texture our model infers from measured images and optimize the predicted geometry by minimizing image, geometry and facial landmark errors.

By: Chenglei Wu, Takaaki Shiratori, Yaser Sheikh

November 24, 2018

On Periodic Functions as Regularizers for Quantization of Neural Networks

ArXiv

Deep learning models have been successfully used in computer vision and many other fields. We propose an unorthodox algorithm for performing quantization of the model parameters.

By: Maxim Naumov, Utku Diril, Jongsoo Park, Benjamin Ray, Jedrzej Jablonski, Andrew Tulloch

November 2, 2018

Do explanations make VQA models more predictable to a human?

Empirical Methods in Natural Language Processing (EMNLP)

A rich line of research attempts to make deep neural networks more transparent by generating human-interpretable ‘explanations’ of their decision process, especially for interactive tasks like Visual Question Answering (VQA). In this work, we analyze if existing explanations indeed make a VQA model – its responses as well as failures – more predictable to a human.

By: Arjun Chandrasekaran, Viraj Prabhu, Deshraj Yadav, Prithvijit Chattopadhyay, Devi Parikh

November 1, 2018

Fast Depth Densification for Occlusion-aware Augmented Reality

SIGGRAPH Asia

Current AR systems only track sparse geometric features but do not compute depth for all pixels. For this reason, most AR effects are pure overlays that can never be occluded by real objects. We present a novel algorithm that propagates sparse depth to every pixel in near realtime.

By: Aleksander Holynski, Johannes Kopf