Research Area
Year Published

161 Results

August 26, 2018

Rosetta: Large Scale System for Text Detection and Recognition in Images

Knowledge Discovery in Databases (KDD)

In this paper we present a deployed, scalable optical character recognition (OCR) system, which we call Rosetta, designed to process images uploaded daily at Facebook scale.

By: Fedor Borisyuk, Albert Gordo, Viswanath Sivakumar

August 21, 2018

SE-Sync: A certifiably correct algorithm for synchronization over the special Euclidean group

International Journal of Robotics Research

Many important geometric estimation problems naturally take the form of synchronization over the special Euclidean group: estimate the values of a set of unknown group elements x1, . . . , xn ∈ SE(d) given noisy measurements of a subset of their pairwise relative transforms xi−1xj. Examples of this class include the foundational problems of pose-graph simultaneous localization and mapping (SLAM) (in robotics), camera motion estimation (in computer vision), and sensor network localization (in distributed sensing), among others. This inference problem is typically formulated as a nonconvex maximum-likelihood estimation that is computationally hard to solve in general. Nevertheless, in this paper we present an algorithm that is able to efficiently recover certifiably globally optimal solutions of the special Euclidean synchronization problem in a non-adversarial noise regime.

By: David M. Rosen, Luca Carlone, Afonso S. Bandeira, John J. Leonard

August 14, 2018

Deep Appearance Models for Face Rendering


We introduce a deep appearance model for rendering the human face. Inspired by Active Appearance Models, we develop a data-driven rendering pipeline that learns a joint representation of facial geometry and appearance from a multiview capture setup.

By: Stephen Lombardi, Jason Saragih, Tomas Simon, Yaser Sheikh

August 13, 2018

Unsupervised Generation of Free-Form and Parameterized Avatars


We study two problems involving the task of mapping images between different domains. The first problem, transfers an image in one domain to an analog image in another domain. The second problem, extends the previous one by mapping an input image to a tied pair, consisting of a vector of parameters and an image that is created using a graphical engine from this vector of parameters.

By: Adam Polyak, Yaniv Taigman, Lior Wolf

August 1, 2018

Neural Modular Control for Embodied Question Answering

Conference on Robot Learning (CoRL)

We present a modular approach for learning policies for navigation over long planning horizons from language input. Our hierarchical policy operates at multiple timescales, where the higher-level master policy proposes subgoals to be executed by specialized sub-policies.

By: Abhishek Das, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra

July 30, 2018

Instant 3D Photography


We present an algorithm for constructing 3D panoramas from a sequence of aligned color-and-depth image pairs. Such sequences can be conveniently captured using dual lens cell phone cameras that reconstruct depth maps from synchronized stereo image capture.

By: Peter Hedman, Johannes Kopf

July 29, 2018

Online Optical Marker-based Hand Tracking with Deep Labels

Special Interest Group on Computer Graphics and Interactive Techniques (SIGGRAPH)

We propose a technique that frames the labeling problem as a keypoint regression problem conducive to a solution using convolutional neural networks.

By: Shangchen Han, Beibei Liu, Robert Wang, Yuting Ye, Christopher D. Twigg, Kenrick Kin

July 10, 2018

Optimizing the Latent Space of Generative Networks

International Conference on Machine Learning (ICML)

The goal of this paper is to disentangle the contribution of these two factors to the success of GANs. In particular, we introduce Generative Latent Optimization (GLO), a framework to train deep convolutional generators using simple reconstruction losses.

By: Piotr Bojanowski, Armand Joulin, David Lopez-Paz, Arthur Szlam

July 7, 2018

Reconstructing Scenes with Mirror and Glass Surfaces


We introduce a fully automatic pipeline that allows us to reconstruct the geometry and extent of planar glass and mirror surfaces while being able to distinguish between the two.

By: Thomas Whelan, Michael Goesele, Steven J. Lovegrove, Julian Straub, Simon Green, Richard Szeliski, Steven Butterfield, Shobhit Verma, Richard Newcombe

June 19, 2018

A Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts

Computer Vision and Pattern Recognition (CVPR)

Most existing zero-shot learning methods consider the problem as a visual semantic embedding one. Given the demonstrated capability of Generative Adversarial Networks (GANs) to generate images, we instead leverage GANs to imagine unseen categories from text descriptions and hence recognize novel classes with no examples being seen.

By: Yizhe Zhu, Mohamed Elhoseiny, Bingchen Liu, Xi Peng, Ahmed Elgammal