Explore the latest research from Facebook

All Publications

June 24, 2021 Zhongzheng Ren, Ishan Misra, Alexander Schwing, Rohit Girdhar
Paper

3D Spatial Recognition without Spatially Labeled 3D

We introduce WyPR, a Weakly-supervised framework for Point cloud Recognition, requiring only scene-level class tags as supervision. WyPR jointly addresses three core 3D recognition tasks: point-level semantic segmentation, 3D proposal generation, and 3D object detection, coupling their predictions through self and cross-task consistency losses.
Paper
June 22, 2021 Jing Huang, Guan Pang, Rama Kovvuri, Mandy Toh, Kevin J Liang, Praveen Krishnan, Xi Yin, Tal Hassner
Paper

A Multiplexed Network for End-to-End, Multilingual OCR

In this paper, we propose an E2E approach, Multiplexed Multilingual Mask TextSpotter, that performs script identification at the word level and handles different scripts with different recognition heads, all while maintaining a unified loss that simultaneously optimizes script identification and multiple recognition heads.
Paper
June 21, 2021 Kenneth Marino, Xinlei Chen, Devi Parikh, Abhinav Gupta, Marcus Rohrbach
Paper

KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA

In this work we study open-domain knowledge, the setting when the knowledge required to answer a question is not given/annotated, neither at training nor test time. We tap into two types of knowledge representations and reasoning.
Paper
June 21, 2021 Amit Raj, Julian Tanke, James Hays, Minh Vo, Carsten Stoll, Christoph Lassner
Paper

ANR: Articulated Neural Rendering for Virtual Avatars

We present Articulated Neural Rendering (ANR), a novel framework based on DNR which explicitly addresses its limitations for virtual human avatars. We show the superiority of ANR not only with respect to DNR but also with methods specialized for avatar creation and animation.
Paper
June 21, 2021 Christoph Lassner, Michael Zollhöfer
Paper

Pulsar: Efficient Sphere-based Neural Rendering

We propose Pulsar, an efficient sphere-based differentiable rendering module that is orders of magnitude faster than competing techniques, modular, and easy-to-use due to its tight integration with PyTorch.
Paper
June 20, 2021 Yanghao Li, Tushar Nagarajan, Bo Xiong, Kristen Grauman
Paper

Ego-Exo: Transferring Visual Representations from Third-person to First-person Videos

We introduce an approach for pre-training egocentric video models using large-scale third-person video datasets.
Paper
June 19, 2021 Pedro Morgado, Nuno Vasconcelos, Ishan Misra
Paper

Audio-Visual Instance Discrimination with Cross-Modal Agreement

We present a self-supervised learning approach to learn audio-visual representations from video and audio. Our method uses contrastive learning for cross-modal discrimination of video from audio and vice-versa.
Paper
June 19, 2021 Menglin Jia, Zuxuan Wu, Austin Reiter, Claire Cardie, Serge Belongie, Ser-Nam Lim
Paper

Intentonomy: a Dataset and Study towards Human Intent Understanding

In this paper, we study the intent behind social media images with an aim to analyze how visual information can facilitate recognition of human intent. Towards this goal, we introduce an intent dataset, Intentonomy, comprising 14K images covering a wide range of everyday scenes.
Paper
June 19, 2021 Shugao Ma, Tomas Simon, Jason Saragih, Dawei Wang, Yuecheng Li, Fernando De la Torre, Yaser Sheikh
Paper

Pixel Codec Avatars

In this work, we present the Pixel Codec Avatars (PiCA): a deep generative model of 3D human faces that achieves state of the art reconstruction performance while being computationally efficient and adaptive to the rendering conditions during execution.
Paper
June 19, 2021 Pedro Morgado, Ishan Misra, Nuno Vasconcelos
Paper

Robust Audio-Visual Instance Discrimination

To alleviate the impact of faulty negatives, we propose to optimize an instance discrimination loss with a soft target distribution that estimates relationships between instances. We validate our contributions through extensive experiments on action recognition tasks and show that they address the problems of audio-visual instance discrimination and improve transfer learning performance.
Paper