Publication

Unsupervised Cross-Domain Singing Voice Conversion

Interspeech


Abstract

We present a wav-to-wav generative model for the task of singing voice conversion from any identity. Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator. The proposed generative architecture is invariant to the speaker’s identity and can be trained to generate target singers from unlabeled training data, using either speech or singing sources. The model is optimized in an end-to-end fashion without any manual supervision, such as lyrics, musical notes or parallel samples. The proposed approach is fully-convolutional and can generate audio in real-time. Experiments show that our method significantly outperforms the baseline methods while generating convincingly better audio samples than alternative attempts.

Related Publications

All Publications

IEEE TSE - February 17, 2021

Machine Learning Testing: Survey, Landscapes and Horizons

Jie M. Zhang, Mark Harman, Lei Ma, Yang Liu

AISTATS - April 13, 2021

Multi-armed Bandits with Cost Subsidy

Deeksha Sinha, Karthik Abinav Sankararaman, Abbas Kazerouni, Vashist Avadhanula

CVPR - June 19, 2021

Pixel Codec Avatars

Shugao Ma, Tomas Simon, Jason Saragih, Dawei Wang, Yuecheng Li, Fernando De la Torre, Yaser Sheikh

CVPR - June 1, 2021

Semi-supervised Synthesis of High-Resolution Editable Textures for 3D Humans

Bindita Chaudhuri, Nikolaos Sarafianos, Linda Shapiro, Tony Tung

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy