Spatial Attention for Far-Field Speech Recognition with Deep Beamforming Neural Networks

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)


In this paper, we introduce spatial attention for refining the information in multi-direction neural beamformer for far-field automatic speech recognition. Previous approaches of neural beamformers with multiple look directions, such as the factored complex linear projection, have shown promising results. However, the features extracted by such methods contain redundant information, as only the direction of the target speech is relevant. We propose using a spatial attention subnet to weigh the features from different directions, so that the subsequent acoustic model could focus on the most relevant features for the speech recognition. Our experimental results show that spatial attention achieves up to 9% relative word error rate improvement over methods without the attention.

Related Publications

All Publications

Libri-light: A benchmark for ASR with limited or no supervision

Jacob Kahn, Morgan Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdelrahman Mohamed, Emmanuel Dupoux

ICASSP - May 4, 2020

An Empirical Study of Transformer-Based Neural Language Model Adaptation

Ke Li, Zhe Liu, Tianxiao Shen, Hongzhao Huang, Fuchun Peng, Daniel Povey, Sanjeev Khudanpur

ICASSP - May 9, 2020

SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation

Arya D. McCarthy, Liezl Puzon, Juan Pino

ICASSP - May 7, 2020

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy