Neural Synthesis of Binaural Speech from Mono Audio

International Conference on Learning Representations (ICLR)


We present a neural rendering approach for binaural sound synthesis that can produce realistic and spatially accurate binaural sound in realtime. The network takes, as input, a single-channel audio source and synthesizes, as output, two-channel binaural sound, conditioned on the relative position and orientation of the listener with respect to the source. We investigate deficiencies of the ℓ2-loss on raw wave-forms in a theoretical analysis and introduce an improved loss that overcomes these limitations. In an empirical evaluation, we establish that our approach is the first to generate spatially accurate waveform outputs (as measured by real recordings) and outperforms existing approaches by a considerable margin, both quantitatively and in a perceptual study. Dataset and code are available online:


Related Publications

All Publications

SIGGRAPH - August 9, 2021

Control Strategies for Physically Simulated Characters Performing Two-player Competitive Sports

Jungdam Won, Deepak Gopinath, Jessica Hodgins

ICML - July 18, 2021

Align, then memorise: the dynamics of learning with feedback alignment

Maria Refinetti, Stéphane d'Ascoli, Ruben Ohana, Sebastian Goldt

CVPR - June 18, 2021

Improving Panoptic Segmentation at All Scales

Lorenzo Porzi, Samuel Rota Bulò, Peter Kontschieder

CVPR - June 21, 2021

KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA

Kenneth Marino, Xinlei Chen, Devi Parikh, Abhinav Gupta, Marcus Rohrbach

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy