Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency



We propose a dynamic encoder transducer (DET) for on-device speech recognition. One DET model scales to multiple devices with different computation capacities without retraining or fine-tuning. To trading off accuracy and latency, DET assigns different encoders to decode different parts of an utterance. We apply and compare the layer dropout and the collaborative learning for DET training. The layer dropout method that randomly drops out encoder layers in the training phase, can do on-demand layer dropout in decoding. Collaborative learning jointly trains multiple encoders with different depths in one single model. Experiment results on Librispeech and in-house data show that DET provides a flexible accuracy and latency trade-off. Results on Librispeech show that the full-size encoder in DET relatively reduces the word error rate of the same size baseline by over 8%. The lightweight encoder in DET trained with collaborative learning reduces the model size by 25% but still gets similar WER as the full-size baseline. DET gets similar accuracy as a baseline model with better latency on a large in-house data set by assigning a lightweight encoder for the beginning part of one utterance and a full-size encoder for the rest.

Related Publications

All Publications

AKBC - October 3, 2021

Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations

Yihong Chen, Pasquale Minervini, Sebastian Riedel, Pontus Stenetorp

ICCV - October 11, 2021

Contrast and Classify: Training Robust VQA Models

Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal

IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) - December 13, 2021

Kaizen: Continuously Improving Teacher Using Exponential Moving Average For Semi-supervised Speech Recognition

Vimal Manohar, Tatiana Likhomanenko, Qiantong Xu, Wei-Ning Hsu, Ronan Collobert, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed

ACL - August 1, 2021

Gender Bias Amplification During Speed-Quality Optimization in Neural Machine Translation

Adithya Renduchintala, Denise Diaz, Kenneth Heafield, Xian Li, Mona Diab

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy