Kaizen: Continuously Improving Teacher Using Exponential Moving Average For Semi-supervised Speech Recognition

IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)


In this paper, we introduce the Kaizen framework that uses a continuously improving teacher to generate pseudo-labels for semi-supervised speech recognition (ASR). The proposed approach uses a teacher model which is updated as the exponential moving average (EMA) of the student model parameters. We demonstrate that it is critical for EMA to be accumulated with full-precision floating point. The Kaizen framework can be seen as a continuous version of the iterative pseudo-labeling approach for semi-supervised training. It is applicable for different training criteria, and in this paper we demonstrate its effectiveness for frame-level hybrid hidden Markov model-deep neural network (HMM-DNN) systems as well as sequence-level Connectionist Temporal Classification (CTC) based models. For large scale real-world unsupervised public videos in UK English and Italian languages the proposed approach i) shows more than 10% relative word error rate (WER) reduction over standard teacher-student training; ii) using just 10 hours of supervised data and a large amount of unsupervised data closes the gap to the upper-bound supervised ASR system that uses 650h or 2700h respectively.

Related Publications

All Publications

AKBC - October 3, 2021

Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations

Yihong Chen, Pasquale Minervini, Sebastian Riedel, Pontus Stenetorp

ICCV - October 11, 2021

Contrast and Classify: Training Robust VQA Models

Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal

Interspeech - August 30, 2021

Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency

Yangyang Shi, Varun Nagaraja, Chunyang Wu, Jay Mahadeokar, Duc Le, Rohit Prabhavalkar, Alex Xiao, Ching-Feng Yeh, Julian Chan, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer

ACL - August 1, 2021

Gender Bias Amplification During Speed-Quality Optimization in Neural Machine Translation

Adithya Renduchintala, Denise Diaz, Kenneth Heafield, Xian Li, Mona Diab

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy