Research Area
Year Published

161 Results

June 13, 2019

Multi-modal Content Localization in Videos Using Weak Supervision

International Conference on Machine Learning (ICML)

Identifying the temporal segments in a video that contain content relevant to a category or task is a difficult but interesting problem. This has applications in fine-grained video indexing and retrieval. Part of the difficulty in this problem comes from the lack of supervision since large-scale annotation of localized segments containing the content of interest is very expensive. In this paper, we propose to use the category assigned to an entire video as weak supervision to our model.

By: Gourab Kundu, Prahal Arora, Ferdi Adeputra, Polina Kuznetsova, Daniel McKinnon, Michelle Cheung, Larry Anazia, Geoffrey Zweig

June 11, 2019

Adversarial Inference for Multi-Sentence Video Description

Conference Computer Vision and Pattern Recognition (CVPR)

While significant progress has been made in the image captioning task, video description is still in its infancy due to the complex nature of video data. Generating multi-sentence descriptions for long videos is even more challenging. Among the main issues are the fluency and coherence of the generated descriptions, and their relevance to the video.

By: Jae Sung Park, Marcus Rohrbach, Trevor Darrell, Anna Rohrbach

June 10, 2019

GEOMetrics: Exploiting Geometric Structure for Graph-Encoded Objects

International Conference on Machine Learning (ICML)

Mesh models are a promising approach for encoding the structure of 3D objects. Current mesh reconstruction systems predict uniformly distributed vertex locations of a predetermined graph through a series of graph convolutions, leading to compromises with respect to performance or resolution. In this paper, we argue that the graph representation of geometric objects allows for additional structure, which should be leveraged for enhanced reconstruction.

By: Edward J. Smith, Scott Fujimoto, Adriana Romero, David Meger

June 9, 2019

Slim DensePose: Thrifty Learning from Sparse Annotations and Motion Cues

Conference Computer Vision and Pattern Recognition (CVPR)

DensePose supersedes traditional landmark detectors by densely mapping image pixels to body surface coordinates. This power, however, comes at a greatly increased annotation time, as supervising the model requires to manually label hundreds of points per pose instance. In this work, we thus seek methods to significantly slim down the DensePose annotations, proposing more efficient data collection strategies.

By: Natalia Neverova, James Thewlis, Riza Alp Guler, Iasonas Kokkinos, Andrea Vedaldi

June 7, 2019

Cycle-Consistency for Robust Visual Question Answering

Computer Vision and Pattern Recognition (CVPR)

Despite significant progress in Visual Question Answering over the years, robustness of today’s VQA models leave much to be desired. We introduce a new evaluation protocol and associated dataset (VQA-Rephrasings) and show that state-of-the-art VQA models are notoriously brittle to linguistic variations in questions.

By: Meet Shah, Xinlei Chen, Marcus Rohrbach, Devi Parikh

June 4, 2019

Engaging Image Captioning via Personality

Conference Computer Vision and Pattern Recognition (CVPR)

Standard image captioning tasks such as COCO and Flickr30k are factual, neutral in tone and (to a human) state the obvious (e.g., “a man playing a guitar”). While such tasks are useful to verify that a machine understands the content of an image, they are not engaging to humans as captions. With this in mind we define a new task, PERSONALITY-CAPTIONS, where the goal is to be as engaging to humans as possible by incorporating controllable style and personality traits.

By: Kurt Shuster, Samuel Humeau, Hexiang Hu, Antoine Bordes, Jason Weston

May 17, 2019

GLoMo: Unsupervised Learning of Transferable Relational Graphs

Conference on Neural Information Processing Systems (NeurIPS)

Modern deep transfer learning approaches have mainly focused on learning generic feature vectors from one task that are transferable to other tasks, such as word embeddings in language and pretrained convolutional features in vision. However, these approaches usually transfer unary features and largely ignore more structured graphical representations. This work explores the possibility of learning generic latent relational graphs that capture dependencies between pairs of data units (e.g., words or pixels) from large-scale unlabeled data and transferring the graphs to downstream tasks.

By: Zhilin Yang, Jake (Junbo) Zhao, Bhuwan Dhingra, Kaiming He, William W. Cohen, Ruslan Salakhutdinov, Yann LeCun

May 6, 2019

Efficient Lifelong Learning with A-GEM

International Conference on Learning Representations (ICLR)

In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task. In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost.

By: Arslan Chaudhry, Marc'Aurelio Ranzato, Marcus Rohrbach, Mohamed Elhoseiny

May 6, 2019

Selfless Sequential Learning

International Conference on Learning Representations (ICLR)

Sequential learning, also called lifelong learning, studies the problem of learning tasks in a sequence with access restricted to only the data of the current task. In this paper we look at a scenario with fixed model capacity, and postulate that the learning process should not be selfish, i.e. it should account for future tasks to be added and thus leave enough capacity for them.

By: Rahaf Aljundi, Marcus Rohrbach, Tinne Tuytelaars

May 4, 2019

Quasi-Hyperbolic Momentum and Adam for Deep Learning

International Conference on Learning Representations (ICLR)

Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning. We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step. We describe numerous connections to and identities with other algorithms, and we characterize the set of two-state optimization algorithms that QHM can recover.

By: Jerry Ma, Denis Yarats