Explore the latest research from Facebook

All Publications

December 13, 2021 Yangyang Xia, Buye Xu, Anurag Kumar
Paper

Incorporating Real-world Noisy Speech in Neural-network-based Speech Enhancement Systems

In this paper, we explore methods that enable supervised speech enhancement systems to train on real-world degraded speech data. Specifically, we propose a semi-supervised approach for speech enhancement in which we first train a modified vector-quantized variational autoencoder that solves a source separation task.
Paper
October 31, 2021 Pedro Rodriguez, Jordan Boyd-Graber
Paper

Evaluation Paradigms in Question Answering

This position paper names and distinguishes these paradigms. Despite substantial overlap, subtle but significant distinctions exert an outsize influence on research. While one evaluation paradigm values creating more intelligent QA systems, the other paradigm values building QA systems that appeal to users.
Paper
October 10, 2021 Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin
Paper

Emerging Properties in Self-Supervised Vision Transformers

In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) [16] that stand out compared to convolutional networks (convnets).
Paper
October 1, 2021 Hung Le, Chinnadhurai Sankar, Seungwhan Moon, Ahmad Beirami, Alborz Geramifard, Satwik Kottur
Paper

DVD: A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue

The dataset is designed to contain minimal biases and has detailed annotations for the different types of reasoning over the spatio-temporal space of video. Dialogues are synthesized over multiple question turns, each of which is injected with a set of cross-turn semantic relationships. We use DVD to analyze existing approaches, providing interesting insights into their abilities and limitations.
Paper
September 27, 2021 Kalyan Vasudev Alwala, Mustafa Mukadam
Paper

Joint Sampling and Trajectory Optimization over Graphs for Online Motion Planning

Here we consider highly dynamic environments with long horizons that necessitate a fast online solution. We present a unified approach that leverages the complementary strengths of sampling and optimization, and interleaves them both in a manner that is well suited to this challenging problem.
Paper
September 1, 2021 Naoki Yokoyama, Sehoon Ha, Dhruv Batra
Paper

Success Weighted by Completion Time: A Dynamics-Aware Evaluation Criteria for Embodied Navigation

We present Success weighted by Completion Time (SCT), a new metric for evaluating navigation performance for mobile robots. Several related works on navigation have used Success weighted by Path Length (SPL) as the primary method of evaluating the path an agent makes to a goal location, but SPL is limited in its ability to properly evaluate agents with complex dynamics.
Paper
August 31, 2021 Chunyang Wu, Zhiping Xiu, Yangyang Shi, Ozlem Kalinli, Christian Fuegen, Thilo Koehler, Qing He
Paper

Transformer-based Acoustic Modeling for Streaming Speech Synthesis

To address the complexity issue in speech synthesis domain, this paper proposes an efficient transformer-based acoustic model that is constant-speed regardless of input sequence length, making it ideal for streaming speech synthesis applications.
Paper
August 31, 2021 Ilyes Khemakhem, Ricardo P. Monti, Robert Leech, Aapo Hyvärinen
Paper

Causal Autoregressive Flows

In this work, we highlight an intrinsic correspondence between a simple family of autoregressive normalizing flows and identifiable causal models. We exploit the fact that autoregressive flow architectures define an ordering over variables, analogous to a causal ordering, to show that they are well-suited to performing a range of causal inference tasks, ranging from causal discovery to making interventional and counterfactual predictions.
Paper
August 29, 2021 Anurag Kumar, Yun Wang, Vamsi Krishna Ithapu, Christian Fuegen
Paper

Do Sound Event Representations Generalize To Other Audio Tasks? A Case Study In Audio Transfer Learning

In this paper, we investigate the transfer learning capacity of audio representations obtained from neural networks trained on a large-scale sound event detection dataset.
Paper
August 29, 2021 Shaked Dovrat, Eliya Nachmani, Lior Wolf
Paper

Many-Speakers Single Channel Speech Separation with Optimal Permutation Training

In this work, we present a permutation invariant training that employs the Hungarian algorithm in order to train with an O(C3) time complexity, where C is the number of speakers, in comparison to O(C!) of PIT based methods.
Paper