Publication

Inferring and Executing Programs for Visual Reasoning

International Conference on Computer Vision (ICCV)


Abstract

Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes. As a result, these black-box models often learn to exploit biases in the data rather than learning to perform visual reasoning. Inspired by module networks, this paper proposes a model for visual reasoning that consists of a program generator that constructs an explicit representation of the reasoning process to be performed, and an execution engine that executes the resulting program to produce an answer. Both the program generator and the execution engine are implemented by neural networks, and are trained using a combination of backpropagation and REINFORCE. Using the CLEVR benchmark for visual reasoning, we show that our model significantly outperforms strong baselines and generalizes better in a variety of settings.

Related Publications

All Publications

Robust Market Equilibria with Uncertain Preferences

Riley Murray, Christian Kroer, Alex Peysakhovich, Parikshit Shah

AAAI - February 12, 2020

Weak-Attention Suppression For Transformer Based Speech Recognition

Yangyang Shi, Yongqiang Wang, Chunyang Wu, Christian Fuegen, Frank Zhang, Duc Le, Ching-Feng Yeh, Michael L. Seltzer

Interspeech - October 26, 2020

Machine Learning in Compilers: Past, Present, and Future

Hugh Leather, Chris Cummins

FDL - September 14, 2020

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy