Publication

Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR

Interspeech


Abstract

End-to-end approaches to automatic speech recognition, such as Listen-Attend-Spell (LAS), blend all components of a traditional speech recognizer into a unified model. Although this simplifies training and decoding pipelines, a unified model is hard to adapt when mismatch exists between training and test data, especially if this information is dynamically changing. The Contextual LAS (CLAS) framework tries to solve this problem by encoding contextual entities into fixed-dimensional embeddings and utilizing an attention mechanism to model the probabilities of seeing these entities. In this work, we improve the CLAS approach by proposing several new strategies to extract embeddings for the contextual entities. We compare these embedding extractors based on graphemic and phonetic input and/or output sequences and show that an encoder-decoder model trained jointly towards graphemes and phonemes out-performs other approaches. Leveraging phonetic information obtains better discrimination for similarly written graphemic sequences and also helps the model generalize better to graphemic sequences unseen in training. We show significant improvements over the original CLAS approach and also demonstrate that the proposed method scales much better to a large number of contextual entities across multiple domains.

Related Publications

All Publications

Libri-light: A benchmark for ASR with limited or no supervision

Jacob Kahn, Morgan Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdelrahman Mohamed, Emmanuel Dupoux

ICASSP - May 4, 2020

Spatial Attention for Far-Field Speech Recognition with Deep Beamforming Neural Networks

Weipeng He, Lu Lu, Biqiao Zhang, Jay Mahadeokar, Kaustubh Kalgaonkar, Christian Fuegen

ICASSP - May 8, 2020

An Empirical Study of Transformer-Based Neural Language Model Adaptation

Ke Li, Zhe Liu, Tianxiao Shen, Hongzhao Huang, Fuchun Peng, Daniel Povey, Sanjeev Khudanpur

ICASSP - May 9, 2020

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy