Publication

LAMV: Learning to align and match videos with kernelized temporal layers

Computer Vision and Pattern Recognition (CVPR)


Abstract

This paper considers a learnable approach for comparing and aligning videos. Our architecture builds upon and revisits temporal match kernels within neural networks: we propose a new temporal layer that finds temporal alignments by maximizing the scores between two sequences of vectors, according to a time-sensitive similarity metric parametrized in the Fourier domain. We learn this layer with a temporal proposal strategy, in which we minimize a triplet loss that takes into account both the localization accuracy and the recognition rate.

We evaluate our approach on video alignment, copy detection and event retrieval. Our approach outperforms the state on the art on temporal video alignment and video copy detection datasets in comparable setups. It also attains the best reported results for particular event search, while precisely aligning videos.

Related Publications

All Publications

Towards Automated Neural Interaction Discovery for Click-Through Rate Prediction

Qingquan Song, Dehua Cheng, Eric Zhou, Jiyan Yang, Yuandong Tian, Xia Hu

KDD - August 1, 2020

Vid2Game: Controllable Characters Extracted from Real-World Videos

Oran Gafni, Lior Wolf, Yaniv Taigman

ICLR - March 10, 2020

Compositionality and Generalization in Emergent Languages

Rahma Chaabouni, Eugene Kharitonov, Diane Bouchacourt, Emmanuel Dupoux, Marco Baroni

ACL - July 4, 2020

The NetHack Learning Environment

Heinrich Küttler, Nantas Nardelli, Alexander H. Miller, Roberta Raileanu, Marco Selvatici, Edward Grefenstette, Tim Rocktäschel

NeurIPS - June 24, 2020

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy