Research Area
Year Published

980 Results

April 20, 2020

Direction of Arrival Estimation in Highly Reverberant Environments Using Soft Time-Frequency Mask

IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

A recent approach to improving the robustness of sound localization in reverberant environments is based on pre-selection of time-frequency pixels that are dominated by direct sound. This approach is equivalent to applying a binary time-frequency mask prior to the localization stage. Although the binary mask approach was shown to be effective, it may not exploit the information available in the captured signal to its full extent. In an attempt to overcome this limitation, it is hereby proposed to employ a soft mask instead of the binary mask.

By: Vladimir Tourbabin, Jacob Donley, Boaz Rafaely, Ravish Mehra

February 7, 2020

Generate, Segment and Refine: Towards Generic Manipulation Segmentation

Conference on Artificial Intelligence (AAAI)

Detecting manipulated images has become a significant emerging challenge. The advent of image sharing platforms and the easy availability of advanced photo editing software have resulted in a large quantities of manipulated images being shared on the internet. While the intent behind such manipulations varies widely, concerns on the spread of false news and misinformation is growing. Current state of the art methods for detecting these manipulated images suffers from the lack of training data due to the laborious labeling process. We address this problem in this paper.

By: Peng Zhou, Bor-Chun Chen, Xintong Han, Mahyar Najibi, Abhinav Shrivastava, Ser Nam Lim, Larry S. Davis

January 13, 2020

Scaling up online speech recognition using ConvNets


We design an online end-to-end speech recognition system based on Time-Depth Separable (TDS) convolutions and Connectionist Temporal Classification (CTC). The system has almost three times the throughput of a well tuned hybrid ASR baseline while also having lower latency and a better word error rate.

By: Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

January 1, 2020

Designing Safe Spaces for Virtual Reality

Ethics in Design and Communication

Virtual Reality (VR) designers accept the ethical responsibilities of removing a user’s entire world and superseding it with a fabricated reality. These unique immersive design challenges are intensified when virtual experiences become public and socially-driven. As female VR designers in 2018, we see an opportunity to fold the language of consent into the design practice of virtual reality—as a means to design safe, accessible, virtual spaces.

Publication will be made available in 2020.

By: Michelle Cortese, Andrea Zeller

December 16, 2019

Study of 3D Virtual Reality Picture Quality

IEEE Journal of Selected Topics in Signal Processing

Virtual Reality (VR) and its applications have attracted significant and increasing attention. However, the requirements of much larger file sizes, different storage formats, and immersive viewing conditions pose significant challenges to the goals of acquiring, transmitting, compressing and displaying high quality VR content. Towards meeting these challenges, it is important to be able to understand the distortions that arise and that can affect the perceived quality of displayed VR content. It is also important to develop ways to automatically predict VR picture quality. Meeting these challenges requires basic tools in the form of large, representative subjective VR quality databases on which VR quality models can be developed and which can be used to benchmark VR quality prediction algorithms. Towards making progress in this direction, here we present the results of an immersive 3D subjective image quality assessment study.

By: Meixu Chen, Yize Jin, Todd Goodall, Xiangxu Yu, Alan C. Bovik
Areas: AR/VR

December 15, 2019

VPS Tactile Display: Tactile Information Transfer of Vibration, Pressure, and Shear

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT)

One of the challenges in the field of haptics is to provide meaningful and realistic sensations to users. While most real world tactile sensations are composed of multiple dimensions, most commercial product only include vibration as it is the most cost effective solution. To improve on this, we introduce VPS (Vibration, Pressure, Shear) display, a multi-dimensional tactile array that increases information transfer by combining Vibration, Pressure, and Shear similar to how RGB LED combines red, blue, and green to create new colors.

By: Lawrence H. Kim, Pablo Castillo, Sean Follmer, Ali Israr
Areas: AR/VR

December 15, 2019

Multi-Sensory Stimuli Improve Distinguishability of Cutaneous Haptic Cues

IEEE Transactions on Haptics

We present experimental results that demonstrate that rendering haptic cues with multi-sensory components—specifically, lateral skin stretch, radial squeeze, and vibrotactile stimuli—improved perceptual distinguishability in comparison to similar cues with all-vibrotactile components. These results support the incorporation of diverse stimuli, both vibrotactile and non-vibrotactile, for applications requiring large haptic cue sets.

By: Jennifer L. Sullivan, Nathan Dunkelberger, Joshua Bradley, Joseph Young, Ali Israr, Frances Lau, Keith Klumb, Freddy Abnousi, Marcia K. O’Malley
Areas: AR/VR

December 14, 2019

From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition

IEEE Automatic Speech Recognition and Understanding Workshop

There is an implicit assumption that traditional hybrid approaches for automatic speech recognition (ASR) cannot directly model graphemes and need to rely on phonetic lexicons to get competitive performance, especially on English which has poor grapheme-phoneme correspondence. In this work, we show for the first time that, on English, hybrid ASR systems can in fact model graphemes effectively by leveraging tied context-dependent graphemes, i.e., chenones.

By: Duc Le, Xiaohui Zhang, Weiyi Zhang, Christian Fuegen, Geoffrey Zweig, Michael L. Seltzer

December 13, 2019

PerspectiveNet: A Scene-consistent Image Generator for New View Synthesis in Real Indoor Environments

Neural Information Processing Systems (NeurIPS)

Given a set of a reference RGBD views of an indoor environment, and a new viewpoint, our goal is to predict the view from that location. Prior work on new-view generation has predominantly focused on significantly constrained scenarios, typically involving artificially rendered views of isolated CAD models. Here we tackle a much more challenging version of the problem. We devise an approach that exploits known geometric properties of the scene (per-frame camera extrinsics and depth) in order to warp reference views into the new ones.

By: David Novotny, Benjamin Graham, Jeremy Reizenstein

December 13, 2019

Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition

Automatic Speech Recognition and Understanding Workshop

The transcriptions used to train an Automatic Speech Recognition (ASR) system may contain errors. Usually, either a quality control stage discards transcriptions with too many errors, or the noisy transcriptions are used as is. We introduce Lead2Gold, a method to train an ASR system that exploits the full potential of noisy transcriptions.

By: Adrien Dufraux, Emmanuel Vincent, Awni Hannun, Armelle Brun, Matthijs Douze