All Research Areas
Research Areas
Year Published

35 Results

July 21, 2017

Link the head to the “beak”: Zero Shot Learning from Noisy Text Description at Part Precision

CVPR 2017

In this paper, we study learning visual classifiers from unstructured text descriptions at part precision with no training images. We propose a learning framework that is able to connect text terms to its relevant parts and suppress connections to non-visual text terms without any part-text annotations. F

By: Mohamed Elhoseiny, Yizhe Zhu, Han Zhang, Ahmed Elgammal
July 21, 2017

Relationship Proposal Networks

Conference on Computer Vision and Pattern Recognition 2017

In this paper we address the challenges of image scene object recognition by using pairs of related regions in images to train a relationship proposer that at test time produces a manageable number of related regions.

By: Ahmed Elgammal, Ji Zhang, Mohamed Elhoseiny, Scott Cohen, Walter Chang
June 8, 2017

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Data @ Scale

In this paper, we empirically show that on the ImageNet dataset large minibatches cause optimization difficulties, but when these are addressed the trained networks exhibit good generalization.

By: Priya Goyal, Piotr Dollar, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He
May 16, 2017

Cultural Diffusion and Trends in Facebook Photographs

The International AAAI Conference on Web and Social Media (ICWSM)

Online social media is a social vehicle in which people share various moments of their lives with their friends, such as playing sports, cooking dinner or just taking a selfie for fun, via visual means, i.e., photographs. Our study takes a closer look at the popular visual concepts illustrating various cultural lifestyles from aggregated, de-identified photographs.

By: Quenzeng You, Dario Garcia, Manohar Paluri, Jiebo Luo, Jungseock Joo
February 22, 2017

Automatic Alt-text: Computer-generated Image Descriptions for Blind Users on a Social Network Service


This paper covers the design and deployment of an automatic alt-text (AAT), a system that applies computer vision technology to identify faces, objects, and themes from photos to generate photo alt-text for screen reader users on Facebook.

By: Shaomei Wu, Jeffrey Wieland, Omid Farivar, Julie Schiller
December 6, 2016

Population Density Estimation with Deconvolutional Neural Networks

Workshop on Large Scale Computer Vision at NIPS 2016

This work is part of the initiative to provide connectivity all over the world. Population density data is helpful in driving a variety of technology decisions, but currently, a microscopic dataset of population doesn’t exist. Current state of the art population density datasets are at ~1000km2 resolution. To create a better dataset, we have obtained 1PB of satellite imagery at 50cm/pixel resolution to feed through our building classification pipeline.

By: Amy Zhang, Andreas Gros, Tobias Tiecke, Xianming Liu
December 6, 2016

Feedback Neural Network for Weakly Supervised Geo-Semantic Segmentation


We propose a novel neural network architecture to perform weakly-supervised learning by suppressing irrelevant neuron activations. When applied to a practical challenge of transforming satellite images into a map of settlements and individual buildings it delivers results that show superior performance and efficiency.

By: Xianming Liu, Amy Zhang, Tobias Tiecke, Andreas Gros, Thomas S. Huang
October 10, 2016

Learning to Refine Object Segments

European Conference on Computer Vision

In this work we propose to augment feedforward nets for object segmentation with a novel top-down refinement approach.

By: Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, Piotr Dollar
September 18, 2016

A MultiPath Network for Object Detection


We test three modifications to the standard Fast R-CNN object detector to determine if they can overcome the object detection challenges in a COCO object detection dataset.

By: Sergey Zagoruyko, Adam Lerer, Tsung-Yi Lin, Pedro O. Pinheiro, Sam Gross, Soumith Chintala, Piotr Dollar
July 25, 2016

Single Image 3D Interpreter Network

European Conference on Computer Vision (ECCV)

In this work, we propose 3D INterpreter Network (3D-INN), an end-to-end framework which sequentially estimates 2D keypoint heatmaps and 3D object structure, trained on both real 2D-annotated images and synthetic 3D data.

By: Antonio Torralba, Jiajun Wu, Joseph J. Lim, Joshua B. Tenenbaum, Tianfan Xue, William T. Freeman, Yuandong Tian