October 22, 2017

Mask R-CNN

International Conference on Computer Vision (ICCV)

We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.

Kaiming He, Georgia Gkioxari, Piotr Dollar, Ross Girshick
October 22, 2017

Focal Loss for Dense Object Detection

International Conference on Computer Vision (ICCV)

The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In this paper, we investigate why this is the case.

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollar
July 21, 2017

Learning Features by Watching Objects Move

CVPR 2017

This paper presents a novel yet intuitive approach to unsupervised feature learning. Inspired by the human visual system, we explore whether low-level motion-based grouping cues can be used to learn an effective visual representation.

Deepak Pathak, Ross Girshick, Piotr Dollar, Trevor Darrell, Bharath Hariharan
July 21, 2017

Feature Pyramid Networks for Object Detection

CVPR 2017

In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost.

Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie
July 21, 2017

Semantic Amodal Segmentation

CVPR 2017

Common visual recognition tasks such as classification, object detection, and semantic segmentation are rapidly reaching maturity, and given the recent rate of progress, it is not unreasonable to conjecture that techniques for many of these problems will approach human levels of performance in the next few years. In this paper we look to the future: what is the next frontier in visual recognition?

Yan Zhu, Yuandong Tian, Dimitris Mexatas, Piotr Dollar
July 21, 2017

Aggregated Residual Transformations for Deep Neural Networks

CVPR 2017

We present a simple, highly modularized network architecture for image classification.

Saining Xie, Ross Girshick, Piotr Dollar, Zhuowen Tu, Kaiming He
June 8, 2017

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Data @ Scale

In this paper, we empirically show that on the ImageNet dataset large minibatches cause optimization difficulties, but when these are addressed the trained networks exhibit good generalization.

Priya Goyal, Piotr Dollar, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He
April 3, 2017

Bag of Tricks for Efficient Text Classification

European Chapter of the Association for Computational Linguistics (EACL)

This paper explores a simple and efficient baseline for text classification.

Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas Mikolov
February 22, 2017

Automatic Alt-text: Computer-generated Image Descriptions for Blind Users on a Social Network Service


Paper covers the design and deployment of an automatic alt-text (AAT), a system that applies computer vision technology to identify faces, objects, and themes from photos to generate photo alt-text for screen reader users on Facebook.

Jeffrey Wieland, Julie Schiller, Omid Farivar, Shaomei Wu
February 7, 2017

Exploring Normalization in Deep Residual Networks with Concatenated Rectified Linear Units


This paper analyzes the role of Batch Normalization (BatchNorm) layers on ResNets in the hope of improving the current architecture and better incorporating other normalization techniques, such as Normalization Propagation (NormProp), into ResNets.

Wenling Shang, Justin Chiu, Kihyuk Sohn
December 6, 2016

Population Density Estimation with Deconvolutional Neural Networks

Workshop on Large Scale Computer Vision at NIPS 2016

This work is part of the Internet.org initiative to provide connectivity all over the world. Population density data is helpful in driving a variety of technology decisions, but currently, a microscopic dataset of population doesn’t exist. Current state of the art population density datasets are at ~1000km2 resolution. To create a better dataset, we have obtained 1PB of satellite imagery at 50cm/pixel resolution to feed through our building classification pipeline.

Amy Zhang, Andreas Gros, Tobias Tiecke, Xianming Liu
November 30, 2016

Semantic Segmentation using Adversarial Networks

Workshop on Adversarial Training at NIPS 2016

Adversarial training has been shown to produce state of the art results for generative image modeling. In this paper we propose an adversarial training approach to train semantic segmentation models.

Pauline Luc, Camille Couprie, Soumith Chintala, Jakob Verbeek
October 10, 2016

Learning to Refine Object Segments

European Conference on Computer Vision

In this work we propose to augment feedforward nets for object segmentation with a novel top-down refinement approach.

Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, Piotr Dollar
October 10, 2016

Revisiting Visual Question Answering Baselines

European Conference on Computer Vision 2016

This paper questions the value of common practices and develops a simple alternative model based on binary classification.

Allan Jabri, Armand Joulin, Laurens van der Maaten
October 10, 2016

Polysemous Codes

European Conference on Computer Vision 2016 (ECCV)

This paper considers the problem of approximate nearest neighbor search in the compressed domain.

Matthijs Douze, Hervé Jégou, Florent Perronnin
October 8, 2016

Learning Visual Features from Large Weakly Supervised Data

European Conference on Computer Vision

In this paper, we explore the potential of leveraging massive, weakly-labeled image collections for learning good visual features.

Armand Joulin, Laurens van der Maaten, Allan Jabri, Nicolas Vasilache
September 18, 2016

A MultiPath Network for Object Detection


We test three modifications to the standard Fast R-CNN object detector to determine if they can overcome the object detection challenges in a COCO object detection dataset.

Sergey Zagoruyko, Adam Lerer, Tsung-Yi Lin, Pedro O. Pinheiro, Sam Gross, Soumith Chintala, Piotr Dollar
September 8, 2016

Joint Learning of Speaker and Phonetic Similarities with Siamese Networks

Interspeech 2016

We scale up the feasibility of jointly learning specialized speaker and phone embeddings architectures to the 360 hours of the Librispeech corpus by implementing a sampling method to efficiently select pairs of words from the dataset and improving the loss function.

Neil Zeghidour, Gabriel Synnaeve, Nicolas Usunier, Emmanuel Dupoux
August 16, 2016

Synergy of Monotonic Rules


This article describes a method for constructing a special rule (we call it synergy rule) that uses as its input information the outputs (scores) of several monotonic rules which solve the same pattern recognition problem.

Vladimir Vapnik, Rauf Izmailov
August 10, 2016

Neural Network-based Word Alignment through Score Aggregation

Association for Computational Linguistics Conference on Machine Translation

We present a simple neural network for word alignment that builds source and target word window representations to compute alignment scores for sentence pairs.

Michael Auli, Ronan Collobert, Joel Legrand