Research Area
Year Published

870 Results

June 14, 2019

2.5D Visual Sound

Conference Computer Vision and Pattern Recognition (CVPR)

Binaural audio provides a listener with 3D sound sensation, allowing a rich perceptual experience of the scene. However, binaural recordings are scarcely available and require nontrivial expertise and equipment to obtain. We propose to convert common monaural audio into binaural audio by leveraging video.

By: Ruohan Gao, Kristen Grauman

June 14, 2019

Thinking Outside the Pool: Active Training Image Creation for Relative Attributes

Conference on Computer Vision and Pattern Recognition (CVPR)

Current wisdom suggests more labeled image data is always better, and obtaining labels is the bottleneck. Yet curating a pool of sufficiently diverse and informative images is itself a challenge. In particular, training image curation is problematic for fine-grained attributes, where the subtle visual differences of interest may be rare within traditional image sources. We propose an active image generation approach to address this issue.

By: Aron Yu, Kristen Grauman

June 13, 2019

Multi-modal Content Localization in Videos Using Weak Supervision

International Conference on Machine Learning (ICML)

Identifying the temporal segments in a video that contain content relevant to a category or task is a difficult but interesting problem. This has applications in fine-grained video indexing and retrieval. Part of the difficulty in this problem comes from the lack of supervision since large-scale annotation of localized segments containing the content of interest is very expensive. In this paper, we propose to use the category assigned to an entire video as weak supervision to our model.

By: Gourab Kundu, Prahal Arora, Ferdi Adeputra, Polina Kuznetsova, Daniel McKinnon, Michelle Cheung, Larry Anazia, Geoffrey Zweig

June 13, 2019

ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation

Conference Computer Vision and Pattern Recognition (CVPR)

This paper proposes an efficient neural network (NN) architecture design methodology called Chameleon that honors given resource constraints. Instead of developing new building blocks or using computationally-intensive reinforcement learning algorithms, our approach leverages existing efficient network building blocks and focuses on exploiting hardware traits and adapting computation resources to fit target latency and/or energy constraints.

By: Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, Peter Vajda, Matt Uyttendaele, Niraj K. Jha

June 13, 2019

Discovering Context Effects from Raw Choice Data

International Conference on Machine Learning (ICML)

Many applications in preference learning assume that decisions come from the maximization of a stable utility function. Yet a large experimental literature shows that individual choices and judgements can be affected by “irrelevant” aspects of the context in which they are made. An important class of such contexts is the composition of the choice set. In this work, our goal is to discover such choice set effects from raw choice data.

By: Arjun Seshadri, Alexander Peysakhovich, Johan Ugander

June 12, 2019

“I just want to feel safe”: A Diary Study of Safety Perceptions on Social Media

International AAAI Conference on Web and Social Media (ICWSM)

While prior work has individually explored specific threats to safety—privacy, security, harassment—in this work we more broadly capture and characterize the full breadth of day-to-day experiences that influence users’ overall perceptions of safety on social media.

By: Elissa M. Redmiles, Jessica Bodford, Lindsay Blackwell

June 11, 2019

ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero

International Conference on Machine Learning (ICML)

The AlphaGo, AlphaGo Zero, and AlphaZero series of algorithms are remarkable demonstrations of deep reinforcement learning’s capabilities, achieving superhuman performance in the complex game of Go with progressively increasing autonomy. However, many obstacles remain in the understanding of and usability of these promising approaches by the research community.

By: Yuandong Tian, Jerry Ma, Qucheng Gong, Shubho Sengupta, Zhuoyuan Chen, James Pinkerton, Larry Zitnick

June 11, 2019

Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering

International Conference on Machine Learning (ICML)

We propose a new class of probabilistic neural-symbolic models, that have symbolic functional programs as a latent, stochastic variable. Instantiated in the context of visual question answering, our probabilistic formulation offers two key conceptual advantages over prior neural-symbolic models for VQA.

By: Ramakrishna Vedantam, Karan Desai, Stefan Lee, Marcus Rohrbach, Dhruv Batra, Devi Parikh

June 11, 2019

Adversarial Inference for Multi-Sentence Video Description

Conference Computer Vision and Pattern Recognition (CVPR)

While significant progress has been made in the image captioning task, video description is still in its infancy due to the complex nature of video data. Generating multi-sentence descriptions for long videos is even more challenging. Among the main issues are the fluency and coherence of the generated descriptions, and their relevance to the video.

By: Jae Sung Park, Marcus Rohrbach, Trevor Darrell, Anna Rohrbach

June 10, 2019

Separating value functions across time-scales

International Conference on Machine Learning (ICML)

In many finite horizon episodic reinforcement learning (RL) settings, it is desirable to optimize for the undiscounted return – in settings like Atari, for instance, the goal is to collect the most points while staying alive in the long run. Yet, it may be difficult (or even intractable) mathematically to learn with this target. As such, temporal discounting is often applied to optimize over a shorter effective planning horizon.

By: Joshua Romoff, Peter Henderson, Ahmed Touati, Emma Brunskill, Joelle Pineau, Yann Ollivier