Explore the latest in Facebook Research through publications

All Publications

June 19, 2020 Ishan Misra, Laurens van der Maaten
Paper

Self-Supervised Learning of Pretext-Invariant Representations

The goal of self-supervised learning from images is to construct image representations that are semantically meaningful via pretext tasks that do not require semantic annotations. Many pretext tasks lead to representations that are covariant with image transformations. We argue that, instead, semantic representations ought to be invariant under such transformations.
Paper
June 18, 2020 Vignesh Ramanathan, Rui Wang, Dhruv Mahajan
Paper

DLWL: Improving Detection for Lowshot classes with Weakly Labelled data

Large detection datasets have a long tail of lowshot classes with very few bounding box annotations. We wish to improve detection for lowshot classes with weakly labelled web-scale datasets only having image-level labels. This requires a detection framework that can be jointly trained with limited number of bounding box annotated images and large number of weakly labelled images. Towards this end, we propose a modification to the FRCNN model to automatically infer label assignment for objects proposals from weakly labelled images during training.
Paper
June 18, 2020 Jiasen Lu, Vedanuj Goswami, Marcus Rohrbach, Devi Parikh, Stefan Lee
Paper

12-in-1: Multi-Task Vision and Language Representation Learning

Much of vision-and-language research focuses on a small but diverse set of independent tasks and supporting datasets often studied in isolation; however, the visually grounded language understanding skills required for success at these tasks overlap significantly. In this work, we investigate these relationships between vision-and-language tasks by developing a large-scale, multi-task training regime
Paper
June 16, 2020 Zhenqiang Ying, Haoran Niu, Praful Gupta, Dhruv Mahajan, Deepti Ghadiyaram, Alan Bovik
Paper

From Patches to Pictures (PaQ-2-PiQ): Mapping the Perceptual Space of Picture Quality

Blind or no-reference (NR) perceptual picture quality prediction is a difficult, unsolved problem of great consequence to the social and streaming media industries that impacts billions of viewers daily. Unfortunately, popular NR prediction models perform poorly on real-world distorted pictures. To advance progress on this problem, we introduce the largest (by far) subjective picture quality database, containing about 40, 000 real-world distorted pictures and 120, 000 patches, on which we collected about 4M human judgments of picture quality.
Paper
June 16, 2020 Yihui He, Rui Yan, Katerina Fragkiadaki, Shoou-I Yu
Paper

Epipolar Transformers

We propose the differentiable “epipolar transformer”, which enables the 2D detector to leverage 3D-aware features to improve 2D pose estimation. The intuition is: given a 2D location p in the current view, we would like to first find its corresponding point p 0 in a neighboring view, and then combine the features at p 0 with the features at p, thus leading to a 3D-aware feature at p.
Paper
June 16, 2020 Shunsuke Saito, Tomas Simon, Jason Saragih, Hanbyul Joo
Paper

PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization

Recent advances in image-based 3D human shape estimation have been driven by the significant improvement in representation power afforded by deep neural networks. Although current approaches have demonstrated the potential in real world settings, they still fail to produce reconstructions with the level of detail often present in the input images. We argue that this limitation stems primarily from two conflicting requirements; accurate predictions require large context, but precise predictions require high resolution.
Paper
June 16, 2020 Rui Wang, David Geraghty, Kevin Matzen, Richard Szeliski, Jan-Michael Frahm
Paper

VPLNet: Deep Single View Normal Estimation with Vanishing Points and Lines

We present a novel single-view surface normal estimation method that combines traditional line and vanishing point analysis with a deep learning approach.
Paper
June 16, 2020 Weiyao Wang, Du Tran, Matt Feiszli
Paper

What Makes Training Multi-modal Classification Networks Hard?

This paper identifies two main causes for this performance drop: first, multi-modal networks are often prone to overfitting due to their increased capacity. Second, different modalities overfit and generalize at different rates, so training them jointly with a single optimization strategy is sub-optimal. We address these two problems with a technique we call Gradient-Blending, which computes an optimal blending of modalities based on their overfitting behaviors.
Paper
June 16, 2020 Edoardo Remelli, Shangchen Han, Sina Honari, Pascal Fua, Robert Wang
Paper

Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation

We present a lightweight solution to recover 3D pose from multi-view images captured with spatially calibrated cameras. Building upon recent advances in interpretable representation learning, we exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Paper
June 15, 2020 Zeng Huang, Yuanlu Xu, Christoph Lassner, Hao Li, Tony Tung
Paper

ARCH: Animatable Reconstruction of Clothed Humans

In this paper, we propose ARCH (Animatable Reconstruction of Clothed Humans), a novel end-to-end framework for accurate reconstruction of animation-ready 3D clothed humans from a monocular image.
Paper