Improving Speech Translation by Understanding and Learning from the Auxiliary Text Translation Task

Association for Computational Linguistics (ACL)


Pretraining and multitask learning are widely used to improve the speech to text translation performance. In this study, we are interested in training a speech to text translation model along with an auxiliary text to text translation task. We conduct a detailed analysis to understand the impact of the auxiliary task on the primary task within the multitask learning framework. Our analysis confirms that multitask learning tends to generate similar decoder representations from different modalities and preserve more information from the pretrained text translation modules. We observe minimal negative transfer effect between the two tasks and sharing more parameters is helpful to transfer knowledge from the text task to the speech task. The analysis also reveals that the modality representation difference at the top decoder layers is still not negligible, and those layers are critical for the translation quality. Inspired by these findings, we propose three methods to improve translation quality. First, a parameter sharing and initialization strategy is proposed to enhance information sharing between the tasks. Second, a novel attention-based regularization is proposed for the encoders and pulls the representations from different modalities closer. Third, an online knowledge distillation is proposed to enhance the knowledge transfer from the text to the speech task. Our experiments show that the proposed approach improves translation performance by more than 2 BLEU over a strong baseline and achieves state-of-theart results on the MUST-C English-German, English-French and English-Spanish language pairs.

Related Publications

All Publications

Electronics (MDPI) Journal - November 4, 2021

Performance Evaluation of Offline Speech Recognition on Edge Devices

Santosh Gondi, Vineel Pratap

NeurIPS - December 6, 2021

Parallel Bayesian Optimization of Multiple Noisy Objectives with Expected Hypervolume Improvement

Samuel Daulton, Maximilian Balandat, Eytan Bakshy

EMNLP Conference on Machine Translation (WMT) - October 1, 2020

BERGAMOT-LATTE Submissions for the WMT20 Quality Estimation Shared Task

Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Vishrav Chaudhary, Mark Fishel, Francisco Guzmán, Lucia Specia

Electronics (MDPI) Journal - November 10, 2021

Performance and Efficiency Evaluation of ASR Inference on the Edge

Santosh Gondi, Vineel Pratap

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookie Policy