Publication

Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline

European Conference on Computer Vision (ECCV)


Abstract

Prior work in visual dialog has focused on training deep neural models on VisDial in isolation. Instead, we present an approach to leverage pretraining on related vision-language datasets before transferring to visual dialog. We adapt the recently proposed ViLBERT model for multi-turn visually-grounded conversations. Our model is pretrained on the Conceptual Captions and Visual Question Answering datasets, and finetuned on VisDial. Our best single model outperforms prior published work by > 1% absolute on NDCG and MRR.

Next, we find that additional finetuning using “dense” annotations in VisDial leads to even higher NDCG – more than 10% over our base model – but hurts MRR – more than 17% below our base model! This highlights a trade-off between the two primary metrics – NDCG and MRR – which we find is due to dense annotations not correlating well with the original ground-truth answers to questions.

Related Publications

All Publications

SIGDIAL - August 1, 2021

Annotation Inconsistency and Entity Bias in MultiWOZ

Kun Qian, Ahmad Berrami, Zhouhan Lin, Ankita De, Alborz Geramifard, Zhou Yu, Chinnadhurai Sankar

Uncertainty and Robustness in Deep Learning Workshop at ICML - August 1, 2020

Tilted Empirical Risk Minimization

Tian Li, Ahmad Beirami, Maziar Sanjabi, Virginia Smith

arxiv - November 1, 2020

The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet Singh, Pratik Ringshia, Davide Testuggine

ICML - July 24, 2021

Using Bifurcations for Diversity in Differentiable Games

Jonathan Lorraine, Jack Parker-Holder, Paul Vicol, Aldo Pacchiano, Luke Metz, Tal Kachman, Jakob Foerster

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy