Audio-Visual Instance Discrimination with Cross-Modal Agreement

Conference on Computer Vision and Pattern Recognition (CVPR)


We present a self-supervised learning approach to learn audio-visual representations from video and audio. Our method uses contrastive learning for cross-modal discrimination of video from audio and vice-versa. We show that optimizing for cross-modal discrimination, rather than within-modal discrimination, is important to learn good representations from video and audio. With this simple but powerful insight, our method achieves highly competitive performance when finetuned on action recognition tasks. Furthermore, while recent work in contrastive learning defines positive and negative samples as individual instances, we generalize this definition by exploring cross-modal agreement. We group together multiple instances as positives by measuring their similarity in both the video and audio feature spaces. Cross-modal agreement creates better positive and negative sets, which allows us to calibrate visual similarities by seeking within-modal discrimination of positive instances, and achieve significant gains on downstream tasks.

Related Publications

All Publications

Uncertainty and Robustness in Deep Learning Workshop at ICML - June 24, 2021

DAIR: Data Augmented Invariant Regularization

Tianjian Huang, Chinnadhurai Sankar, Pooyan Amini, Satwik Kottur, Alborz Geramifard, Meisam Razaviyayn, Ahmad Beirami

CVPR - June 21, 2021

img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation

VĂ­tor Albiero, Xingyu Chen, Xi Yin, Guan Pang, Tal Hassner

AutoML Workshop at NeurIPS - July 18, 2021

Neural Fixed-Point Acceleration for Convex Optimization

Shobha Venkataraman, Brandon Amos

Federated Learning for User Privacy and Data Confidentiality Workshop At ICML - July 24, 2021

Federated Learning with Buffered Asynchronous Aggregation

John Nguyen, Kshitiz Malik, Hongyuan Zhan, Ashkan Yousefpour, Michael Rabbat, Mani Malek, Dzmitry Huba

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy