Research Area
Year Published

166 Results

May 4, 2020

SeCoST: Sequential Co-Supervision for Weakly Labeled Audio Event Detection

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Weakly supervised learning algorithms are critical for scaling audio event detection to several hundreds of sound categories. Such learning models should not only disambiguate sound events efficiently with minimal class-specific annotation but also be robust to label noise, which is more apparent with weak labels instead of strong annotations. In this work, we propose a new framework for designing learning models with weak supervision by bridging ideas from sequential learning and knowledge distillation.

By: Anurag Kumar, Vamsi Krishna Ithapu

April 27, 2020

Generalization through Memorization: Nearest Neighbor Language Models

International Conference on Learning Representations (ICLR)

We introduce kNN-LMs, which extend a pre-trained neural language model (LM) by linearly interpolating it with a k-nearest neighbors (kNN) model. The nearest neighbors are computed according to distance in the pre-trained LM embedding space, and can be drawn from any text collection, including the original LM training data.

By: Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, Mike Lewis

April 25, 2020

Permutation Equivariant Models for Compositional Generalization in Language

International Conference on Learning Representations (ICLR)

Humans understand novel sentences by composing meanings and roles of core language components. In contrast, neural network models for natural language modeling fail when such compositional generalization is required. The main contribution of this paper is to hypothesize that language compositionality is a form of group-equivariance. Based on this hypothesis, we propose a set of tools for constructing equivariant sequence-to-sequence models.

By: Jonathan Gordon, David Lopez-Paz, Marco Baroni, Diane Bouchacourt

April 9, 2020

Environment-aware reconfigurable noise suppression

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

The paper proposes an efficient, robust, and reconfigurable technique to suppress various types of noises for any sampling rate. The theoretical analyses, subjective and objective test results show that the proposed noise suppression (NS) solution significantly enhances the speech transmission index (STI), speech intelligibility (SI), signal-to-noise ratio (SNR), and subjective listening experience.

By: Jun Yang, Joshua Bingham

February 14, 2020

Neural Machine Translation with Byte-Level Subwords

Conference on Artificial Intelligence (AAAI)

In this paper, we investigate byte-level subwords, specifically byte-level BPE (BBPE), which is compacter than character vocabulary and has no out-of-vocabulary tokens, but is more efficient than using pure bytes only is. We claim that contextualizing BBPE embeddings is necessary, which can be implemented by a convolutional or recurrent layer.

By: Changhan Wang, Kyunghyun Cho, Jiatao Gu

February 14, 2020

RTFM: Generalizing to Novel Environment via Reading

International Conference on Learning Representations (ICLR)

Obtaining policies that can generalise to new environments in reinforcement learning is challenging. In this work, we demonstrate that language understanding via a reading policy learner is a promising vehicle for generalisation to new environments. We propose a grounded policy learning problem, Read to Fight Monsters (RTFM), in which the agent must jointly reason over a language goal, relevant dynamics described in a document, and environment observations

By: Victor Zhong, Tim Rocktäschel, Edward Grefenstette

January 13, 2020

Scaling up online speech recognition using ConvNets

arXiv

We design an online end-to-end speech recognition system based on Time-Depth Separable (TDS) convolutions and Connectionist Temporal Classification (CTC). The system has almost three times the throughput of a well tuned hybrid ASR baseline while also having lower latency and a better word error rate.

By: Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

December 14, 2019

From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition

IEEE Automatic Speech Recognition and Understanding Workshop

There is an implicit assumption that traditional hybrid approaches for automatic speech recognition (ASR) cannot directly model graphemes and need to rely on phonetic lexicons to get competitive performance, especially on English which has poor grapheme-phoneme correspondence. In this work, we show for the first time that, on English, hybrid ASR systems can in fact model graphemes effectively by leveraging tied context-dependent graphemes, i.e., chenones.

By: Duc Le, Xiaohui Zhang, Weiyi Zhang, Christian Fuegen, Geoffrey Zweig, Michael L. Seltzer

December 13, 2019

Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition

Automatic Speech Recognition and Understanding Workshop

The transcriptions used to train an Automatic Speech Recognition (ASR) system may contain errors. Usually, either a quality control stage discards transcriptions with too many errors, or the noisy transcriptions are used as is. We introduce Lead2Gold, a method to train an ASR system that exploits the full potential of noisy transcriptions.

By: Adrien Dufraux, Emmanuel Vincent, Awni Hannun, Armelle Brun, Matthijs Douze

December 12, 2019

Compositional generalization through meta sequence-to-sequence learning

Neural Information Processing Systems (NeurIPS)

People can learn a new concept and use it compositionally, understanding how to “blicket twice” after learning how to “blicket.” In contrast, powerful sequence-to-sequence (seq2seq) neural networks fail such tests of compositionality, especially when composing new concepts together with existing concepts. In this paper, I show how memory-augmented neural networks can be trained to generalize compositionally through meta seq2seq learning.

By: Brenden Lake