Research Area
Year Published

201 Results

February 7, 2020

Generate, Segment and Refine: Towards Generic Manipulation Segmentation

Conference on Artificial Intelligence (AAAI)

Detecting manipulated images has become a significant emerging challenge. The advent of image sharing platforms and the easy availability of advanced photo editing software have resulted in a large quantities of manipulated images being shared on the internet. While the intent behind such manipulations varies widely, concerns on the spread of false news and misinformation is growing. Current state of the art methods for detecting these manipulated images suffers from the lack of training data due to the laborious labeling process. We address this problem in this paper.

By: Peng Zhou, Bor-Chun Chen, Xintong Han, Mahyar Najibi, Abhinav Shrivastava, Ser Nam Lim, Larry S. Davis

December 13, 2019

PerspectiveNet: A Scene-consistent Image Generator for New View Synthesis in Real Indoor Environments

Neural Information Processing Systems (NeurIPS)

Given a set of a reference RGBD views of an indoor environment, and a new viewpoint, our goal is to predict the view from that location. Prior work on new-view generation has predominantly focused on significantly constrained scenarios, typically involving artificially rendered views of isolated CAD models. Here we tackle a much more challenging version of the problem. We devise an approach that exploits known geometric properties of the scene (per-frame camera extrinsics and depth) in order to warp reference views into the new ones.

By: David Novotny, Benjamin Graham, Jeremy Reizenstein

December 11, 2019

Finding the Needle in the Haystack with Convolutions: on the benefits of architectural bias

Neural Information Processing Systems (NeurIPS)

Despite the phenomenal success of deep neural networks in a broad range of learning tasks, there is a lack of theory to understand the way they work. In particular, Convolutional Neural Networks (CNNs) are known to perform much better than Fully-Connected Networks (FCNs) on spatially structured data: the architectural structure of CNNs benefits from prior knowledge on the features of the data, for instance their translation invariance. The aim of this work is to understand this fact through the lens of dynamics in the loss landscape.

By: Stéphane d'Ascoli, Levent Sagun, Joan Bruna, Giulio Biroli

December 10, 2019

One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers

Neural Information Processing Systems (NeurIPS)

The success of lottery ticket initializations [7] suggests that small, sparsified networks can be trained so long as the network is initialized appropriately. Unfortunately, finding these “winning ticket” initializations is computationally expensive. One potential solution is to reuse the same winning tickets across a variety of datasets and optimizers. However, the generality of winning ticket initializations remains unclear. Here, we attempt to answer this question by generating winning tickets for one training configuration (optimizer and dataset) and evaluating their performance on another configuration.

By: Ari Morcos, Haonan Yu, Michela Paganini, Yuandong Tian

December 9, 2019

Fixing the train-test resolution discrepancy

Neural Information Processing Systems (NeurIPS)

Data-augmentation is key to the training of neural networks for image classification. This paper first shows that existing augmentations induce a significant discrepancy between the size of the objects seen by the classifier at train and test time: in fact, a lower train resolution improves the classification at test time!

By: Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Hervé Jégou

December 9, 2019

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

Neural Information Processing Systems (NeurIPS)

We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language. We extend the popular BERT architecture to a multi-modal two-stream model, processing both visual and textual inputs in separate streams that interact through co-attentional transformer layers.

By: Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee

December 9, 2019

Cold Case: the Lost MNIST Digits

Neural Information Processing Systems (NeurIPS)

Although the popular MNIST dataset [LeCun et al., 1994] is derived from the NIST database [Grother and Hanaoka, 1995], the precise processing steps for this derivation have been lost to time. We propose a reconstruction that is accurate enough to serve as a replacement for the MNIST dataset, with insignificant changes in accuracy. We trace each MNIST digit to its NIST source and its rich metadata such as writer identifier, partition identifier, etc.

By: Chhavi Yadav, Leon Bottou

December 8, 2019

Chasing Ghosts: Instruction Following as Bayesian State Tracking

Neural Information Processing Systems (NeurIPS)

A visually-grounded navigation instruction can be interpreted as a sequence of expected observations and actions an agent following the correct trajectory would encounter and perform. Based on this intuition, we formulate the problem of finding the goal location in Vision-and-Language Navigation (VLN) [1] within the framework of Bayesian state tracking – learning observation and motion models conditioned on these expectable events.

By: Peter Anderson, Ayush Shrivastava, Devi Parikh, Dhruv Batra, Stefan Lee

December 8, 2019

RUBi: Reducing Unimodal Biases for Visual Question Answering

Neural Information Processing Systems (NeurIPS)

Visual Question Answering (VQA) is the task of answering questions about an image. Some VQA models often exploit unimodal biases to provide the correct answer without using the image information. As a result, they suffer from a huge drop in performance when evaluated on data outside their training set distribution. This critical issue makes them unsuitable for real-world settings. We propose RUBi, a new learning strategy to reduce biases in any VQA model. It reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image.

By: Remi Cadene, Corentin Dancette, Hedi Ben-younes, Matthieu Cord, Devi Parikh

December 8, 2019

Cross-channel Communication Networks

Neural Information Processing Systems (NeurIPS)

Convolutional neural networks process input data by sending channel-wise feature response maps to subsequent layers. While a lot of progress has been made by making networks deeper, information from each channel can only be propagated from lower levels to higher levels in a hierarchical feed-forward manner. When viewing each filter in the convolutional layer as a neuron, those neurons are not communicating explicitly within each layer in CNNs. We introduce a novel network unit called Cross-channel Communication (C3) block, a simple yet effective module to encourage the neuron communication within the same layer.

By: Jianwei Yang, Zhile Ren, Hongyuan Zhu, Ji Lin, Chuang Gan, Devi Parikh