Explore the latest in Facebook Research through publications

All Publications

January 1, 2021 Mahmoud Assran, Michael Rabbat
Paper

Asynchronous Gradient-Push

We consider a multi-agent framework for distributed optimization where each agent has access to a local smooth strongly convex function, and the collective goal is to achieve consensus on the parameters that minimize the sum of the agents’ local functions. We propose an algorithm wherein each agent operates asynchronously and independently of the other agents.
Paper
June 19, 2020 Eric Michael Smith, Mary Williamson, Kurt Shuster, Jason Weston, Y-Lan Boureau
Paper

Can You Put it All Together: Evaluating Conversational Agents’ Ability to Blend Skills

In this work, we investigate several ways to combine models trained towards isolated capabilities, ranging from simple model aggregation schemes that require minimal additional training, to various forms of multi-task training that encompass several skills at all training stages.
Paper
June 16, 2020 Shunsuke Saito, Tomas Simon, Jason Saragih, Hanbyul Joo
Paper

PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization

Recent advances in image-based 3D human shape estimation have been driven by the significant improvement in representation power afforded by deep neural networks. Although current approaches have demonstrated the potential in real world settings, they still fail to produce reconstructions with the level of detail often present in the input images. We argue that this limitation stems primarily from two conflicting requirements; accurate predictions require large context, but precise predictions require high resolution.
Paper
June 16, 2020 Yihui He, Rui Yan, Katerina Fragkiadaki, Shoou-I Yu
Paper

Epipolar Transformers

We propose the differentiable “epipolar transformer”, which enables the 2D detector to leverage 3D-aware features to improve 2D pose estimation. The intuition is: given a 2D location p in the current view, we would like to first find its corresponding point p 0 in a neighboring view, and then combine the features at p 0 with the features at p, thus leading to a 3D-aware feature at p.
Paper
June 15, 2020 Zeng Huang, Yuanlu Xu, Christoph Lassner, Hao Li, Tony Tung
Paper

ARCH: Animatable Reconstruction of Clothed Humans

In this paper, we propose ARCH (Animatable Reconstruction of Clothed Humans), a novel end-to-end framework for accurate reconstruction of animation-ready 3D clothed humans from a monocular image.
Paper
June 14, 2020 Ronghang Hu, Amanpreet Singh, Trevor Darrell, Marcus Rohrbach
Paper

Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA

In this work, we propose a novel model for the TextVQA task based on a multimodal transformer architecture accompanied by a rich representation for text in images.
Paper
June 14, 2020 Evonne Ng, Donglai Xiang, Hanbyul Joo, Kristen Grauman
Paper

You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions

The body pose of a person wearing a camera is of great interest for applications in augmented reality, healthcare, and robotics, yet much of the person’s body is out of view for a typical wearable camera. We propose a learning-based approach to estimate the camera wearer’s 3D body pose from egocentric video sequences.
Paper
June 14, 2020 Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollar
Paper

Designing Network Design Spaces

In this work, we present a new network design paradigm. Our goal is to help advance the understanding of network design and discover design principles that generalize across settings.
Paper
June 14, 2020 Wei-Lin Hsiao, Kristen Grauman
Paper

ViBE: Dressing for Diverse Body Shapes

We introduce ViBE, a VIsual Body-aware Embedding that captures clothing’s affinity with different body shapes. Given an image of a person, the proposed embedding identifies garments that will flatter her specific body shape.
Paper
June 14, 2020 Ruohan Gao, Tae-Hyun Oh, Kristen Grauman, Lorenzo Torresani
Paper

Listen to Look: Action Recognition by Previewing Audio

In the face of the video data deluge, today’s expensive clip-level classifiers are increasingly impractical. We propose a framework for efficient action recognition in untrimmed video that uses audio as a preview mechanism to eliminate both short-term and long-term visual redundancies.
Paper