Research Area
Year Published

125 Results

June 16, 2019

On the Idiosyncrasies of the Mandarin Chinese Classifier System

North American Chapter of the Association for Computational Linguistics (NAACL)

While idiosyncrasies of the Chinese classifier system have been a richly studied topic among linguists (Adams and Conklin, 1973; Erbaugh, 1986; Lakoff, 1986), not much work has been done to quantify them with statistical methods. In this paper, we introduce an information-theoretic approach to measuring idiosyncrasy; we examine how much the uncertainty in Mandarin Chinese classifiers can be reduced by knowing semantic information about the nouns that the classifiers modify.

By: Shijia Liu, Hongyuan Mei, Adina Williams, Ryan Cotterell

June 16, 2019

Towards VQA Models That Can Read

Conference Computer Vision and Pattern Recognition (CVPR)

Studies have shown that a dominant class of questions asked by visually impaired users on images of their surroundings involves reading text in the image. But today’s VQA models can not read! Our paper takes a first step towards addressing this problem.

By: Amanpreet Singh, Vivek Natarajan, Meet Shah, Yu Jiang, Xinlei Chen, Dhruv Batra, Devi Parikh, Marcus Rohrbach

June 13, 2019

Multi-modal Content Localization in Videos Using Weak Supervision

International Conference on Machine Learning (ICML)

Identifying the temporal segments in a video that contain content relevant to a category or task is a difficult but interesting problem. This has applications in fine-grained video indexing and retrieval. Part of the difficulty in this problem comes from the lack of supervision since large-scale annotation of localized segments containing the content of interest is very expensive. In this paper, we propose to use the category assigned to an entire video as weak supervision to our model.

By: Gourab Kundu, Prahal Arora, Ferdi Adeputra, Polina Kuznetsova, Daniel McKinnon, Michelle Cheung, Larry Anazia, Geoffrey Zweig

June 11, 2019

Adversarial Inference for Multi-Sentence Video Description

Conference Computer Vision and Pattern Recognition (CVPR)

While significant progress has been made in the image captioning task, video description is still in its infancy due to the complex nature of video data. Generating multi-sentence descriptions for long videos is even more challenging. Among the main issues are the fluency and coherence of the generated descriptions, and their relevance to the video.

By: Jae Sung Park, Marcus Rohrbach, Trevor Darrell, Anna Rohrbach

June 10, 2019

Non-Monotonic Sequential Text Generation

International Conference on Machine Learning (ICML)

Standard sequential generation methods assume a pre-specified generation order, such as text generation methods which generate words from left to right. In this work, we propose a framework for training models of text generation that operate in non-monotonic orders; the model directly learns good orders, without any additional annotation.

By: Sean Welleck, Kianté Brantley, Hal Daumé III, Kyunghyun Cho

June 10, 2019

On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models

Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)

Adversarial examples — perturbations to the input of a model that elicit large changes in the output — have been shown to be an effective way of assessing the robustness of sequence-to-sequence (seq2seq) models. However, these perturbations only indicate weaknesses in the model if they do not change the input so significantly that it legitimately results in changes in the expected output.

By: Paul Michel, Xian Li, Graham Neubig, Juan Pino

June 10, 2019

Mixture Models for Diverse Machine Translation: Tricks of the Trade

International Conference on Machine Learning (ICML)

We develop an evaluation protocol to assess both quality and diversity of generations against multiple references, and provide an extensive empirical study of several mixture model variants. Our analysis shows that certain types of mixture models are more robust and offer the best trade-off between translation quality and diversity compared to variational models and diverse decoding approaches.

By: Tianxiao Shen, Myle Ott, Michael Auli, Marc'Aurelio Ranzato

June 7, 2019

Cycle-Consistency for Robust Visual Question Answering

Computer Vision and Pattern Recognition (CVPR)

Despite significant progress in Visual Question Answering over the years, robustness of today’s VQA models leave much to be desired. We introduce a new evaluation protocol and associated dataset (VQA-Rephrasings) and show that state-of-the-art VQA models are notoriously brittle to linguistic variations in questions.

By: Meet Shah, Xinlei Chen, Marcus Rohrbach, Devi Parikh

June 5, 2019

CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog

Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)

We develop CLEVR-Dialog, a large diagnostic dataset for studying multi-round reasoning in visual dialog. Specifically, we construct a dialog grammar that is grounded in the scene graphs of the images from the CLEVR dataset.

By: Satwik Kottur, José M.F. Moura, Devi Parikh, Dhruv Batra, Marcus Rohrbach

June 4, 2019

Engaging Image Captioning via Personality

Conference Computer Vision and Pattern Recognition (CVPR)

Standard image captioning tasks such as COCO and Flickr30k are factual, neutral in tone and (to a human) state the obvious (e.g., “a man playing a guitar”). While such tasks are useful to verify that a machine understands the content of an image, they are not engaging to humans as captions. With this in mind we define a new task, PERSONALITY-CAPTIONS, where the goal is to be as engaging to humans as possible by incorporating controllable style and personality traits.

By: Kurt Shuster, Samuel Humeau, Hexiang Hu, Antoine Bordes, Jason Weston