Explore the latest in Facebook Research through publications

All Publications

January 5, 2021 Nirbhay Modhe, Prithvijit Chattopadhyay, Mohit Sharma, Abhishek Das, Devi Parikh, Dhruv Batra, Ramakrishna Vedantam
Paper

IR-VIC: Unsupervised Discovery of Sub-goals for Transfer in RL

We propose a novel framework to identify subgoals useful for exploration in sequential decision making tasks under partial observability.
Paper
January 1, 2021 Mahmoud Assran, Michael Rabbat
Paper

Asynchronous Gradient-Push

We consider a multi-agent framework for distributed optimization where each agent has access to a local smooth strongly convex function, and the collective goal is to achieve consensus on the parameters that minimize the sum of the agents’ local functions. We propose an algorithm wherein each agent operates asynchronously and independently of the other agents.
Paper
August 27, 2020 Noam Zilberstein
Paper

Eliminating Bugs with Dependent Haskell (Experience Report)

Using dependent types in production code is a practical way to eliminate errors. While there are many examples of using dependent Haskell to prove invariants about code, few of these are applied to large scale production systems. Critics claim that dependent types are only useful in toy examples and that they are impractical for use in the real world. This experience report analyzes real world examples where dependent types have enabled us to find and eliminate bugs in production Haskell code.
Paper
August 25, 2020 Sayna Ebrahimi, Franziska Meier, Roberto Calandra, Trevor Darrell, Marcus Rohrbach
Paper

Adversarial Continual Learning

Continual learning aims to learn new tasks without forgetting previously learned ones. We hypothesize that representations learned to solve each task in a sequence have a shared structure while containing some task-specific properties. We show that shared features are significantly less prone to forgetting and propose a novel hybrid continual learning framework that learns a disjoint representation for task-invariant and task-specific features required to solve a sequence of tasks.
Paper
August 24, 2020 Hang Chu, Shugao Ma, Fernando De la Torre, Sanja Fidler, Yaser Sheikh
Paper

Expressive Telepresence via Modular Codec Avatars

VR telepresence consists of interacting with another human in a virtual space represented by an avatar. Today most avatars are cartoon-like, but soon the technology will allow video-realistic ones. This paper aims in this direction, and presents Modular Codec Avatars (MCA), a method to generate hyper-realistic faces driven by the cameras in the VR headset.
Paper
August 24, 2020 Chih-Yao Ma, Yannis Kalantidis, Ghassan AlRegib, Peter Vajda, Marcus Rohrbach, Zsolt Kira
Paper

Learning to Generate Grounded Visual Captions without Localization Supervision

In this work, we help the model to achieve this via a novel cyclical training regimen that forces the model to localize each word in the image after the sentence decoder generates it, and then reconstruct the sentence from the localized image region(s) to match the ground-truth. Our proposed framework only requires learning one extra fully-connected layer (the localizer), a layer that can be removed at test time.
Paper
August 23, 2020 Gyeongsik Moon, Shoou-I Yu, He Wen, Takaaki Shiratori, Kyoung Mu Lee
Paper

InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image

Analysis of hand-hand interactions is a crucial step towards better understanding human behavior. However, most researches in 3D hand pose estimation have focused on the isolated single hand case. Therefore, we firstly propose (1) a large-scale dataset, InterHand2.6M, and (2) a baseline network, InterNet, for 3D interacting hand pose estimation from a single RGB image.
Paper
August 23, 2020 Jason Y. Zhang, Sam Pepose, Hanbyul Joo, Deva Ramanan, Jitendra Malik, Angjoo Kanazawa
Paper

Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild

We present a method that infers spatial arrangements and shapes of humans and objects in a globally consistent 3D scene, all from a single image in-the-wild captured in an uncontrolled environment. Notably, our method runs on datasets without any scene- or object-level 3D supervision. Our key insight is that considering humans and objects jointly gives rise to “3D common sense” constraints that can be used to resolve ambiguity.
Paper
August 23, 2020 Medhini Narasimhan, Erik Wijmans, Xinlei Chen, Trevor Darrell, Dhruv Batra, Devi Parikh, Amanpreet Singh
Paper

Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation

We introduce a learning-based approach for room navigation using semantic maps. Our proposed architecture learns to predict top-down belief maps of regions that lie beyond the agent’s field of view while modeling architectural and stylistic regularities in houses. First, we train a model to generate amodal semantic top-down maps indicating beliefs of location, size, and shape of rooms by learning the underlying architectural patterns in houses. Next, we use these maps to predict a point that lies in the target room and train a policy to navigate to the point.
Paper
August 23, 2020 Yash Kant, Dhruv Batra, Peter Anderson, Alexander Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal
Paper

Spatially Aware Multimodal Transformers for TextVQA

Textual cues are essential for everyday tasks like buying groceries and using public transport. To develop this assistive technology, we study the TextVQA task, i.e., reasoning about text in images to answer a question. Existing approaches are limited in their use of spatial relations and rely on fully-connected transformer-based architectures to implicitly learn the spatial structure of a scene. In contrast, we propose a novel spatially aware self-attention layer such that each visual entity only looks at neighboring entities defined by a spatial graph.
Paper