Research Area
Year Published

881 Results

September 30, 2019

Neural Code Search Evaluation Dataset


There has been an increase of interest in code search using natural language. Assessing the performance of such code search models can be difficult without a readily available evaluation suite. In this paper, we present an evaluation dataset of natural language query and code snippet pairs for future work. We also provide the results of two code search models ([6] and [1]) from recent work as a benchmark.

By: Hongyu Li, Seohyun Kim, Satish Chandra

September 17, 2019

Unsupervised Singing Voice Conversion


We present a deep learning method for singing voice conversion. The proposed network is not conditioned on the text or on the notes, and it directly converts the audio of one singer to the voice of another. Training is performed without any form of supervision: no lyrics or any kind of phonetic features, no notes, and no matching samples between singers.

By: Eliya Nachmani, Lior Wolf

September 16, 2019

Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR


End-to-end approaches to automatic speech recognition, such as Listen-Attend-Spell (LAS), blend all components of a traditional speech recognizer into a unified model. Although this simplifies training and decoding pipelines, a unified model is hard to adapt when mismatch exists between training and test data, especially if this information is dynamically changing.

By: Zhehuai Chen, Mahaveer Jain, Yongqiang Wang, Michael L. Seltzer, Christian Fuegen

September 15, 2019

wav2vec: Unsupervised Pre-training for Speech Recognition


We explore unsupervised pre-training for speech recognition by learning representations of raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting representations are then used to improve acoustic model training. We pre-train a simple multi-layer convolutional neural network optimized via a noise contrastive binary classification task.

By: Steffen Schneider, Alexei Baevski, Ronan Collobert, Michael Auli

September 15, 2019

Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions


We propose a fully convolutional sequence-to-sequence encoder architecture with a simple and efficient decoder. Our model improves WER on LibriSpeech while being an order of magnitude more efficient than a strong RNN baseline.

By: Awni Hannun, Ann Lee, Qiantong Xu, Ronan Collobert

September 15, 2019

Who Needs Words? Lexicon-Free Speech Recognition


Lexicon-free speech recognition naturally deals with the problem of out-of-vocabulary (OOV) words. In this paper, we show that character-based language models (LM) can perform as well as word-based LMs for speech recognition, in word error rates (WER), even without restricting the decoding to a lexicon.

By: Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert

September 10, 2019

Bridging the Gap Between Relevance Matching and Semantic Matching for Short Text Similarity Modeling

Conference on Empirical Methods in Natural Language Processing (EMNLP)

We propose a novel model, HCAN (Hybrid Co-Attention Network), that comprises (1) a hybrid encoder module that includes ConvNet-based and LSTM-based encoders, (2) a relevance matching module that measures soft term matches with importance weighting at multiple granularities, and (3) a semantic matching module with co-attention mechanisms that capture context-aware semantic relatedness.

By: Jinfeng Rao, Linqing Liu, Yi Tay, Wei Yang, Peng Shi, Jimmy Lin

September 9, 2019

Flexible binaural resynthesis of room impulse responses for augmented reality research

EAA Spatial Audio Signal Processing Symposium (SASP)

A basic building block of audio for Augmented Reality (AR) is the use of virtual sound sources layered on top of real sources present in an environment. In order to perceive these virtual sources as belonging to the natural scene it is important to carefully replicate the room acoustics of the listening space. However, it is unclear to what extent the real and virtual room impulse responses (RIR) need to be matched in order to generate plausible scenes in which virtual sound sources blend seamlessly with real sound sources. This contribution presents an auralization framework that allows binaural rendering, manipulation and reproduction of room acoustics in augmented reality scenarios, in order to get a better understanding of the perceptual relevance of individual room acoustic parameters.

By: Sebastià V. Amengual Garí, W. Owen Brimijoin, Henrik G. Hassager, Philip W. Robinson
Areas: AR/VR

September 6, 2019

Perceptual comparison of ambisonics-based reverberation methods in binaural listening

EAA Spatial Audio Signal Processing Symposium (SASP)

Reverberation plays a fundamental role in the auralisation of enclosed spaces as it contributes to the realism and immersiveness of virtual 3D sound scenes. However, rigorous simulation of interactive room acoustics is computationally expensive, and it is common practice to use simplified models at the cost of accuracy. In the present study, two subjective listening tests were carried out to explore trade-offs between algorithmic complexity (and approach) and perceived spatialisation quality in a binaural spatialisation context.

By: Isaac Engel, Craig Henry, Sebastià V. Amengual Garí, Philip Robinson, David Poirier-Quinot, Lorenzo Picinali
Areas: AR/VR

September 5, 2019

C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion

International Conference on Computer Vision (ICCV)

We propose C3DPO, a method for extracting 3D models of deformable objects from 2D keypoint annotations in unconstrained images. We do so by learning a deep network that reconstructs a 3D object from a single view at a time, accounting for partial occlusions, and explicitly factoring the effects of viewpoint changes and object deformations.

By: David Novotny, Nikhila Ravi, Benjamin Graham, Natalia Neverova, Andrea Vedaldi