October 30, 2017

Crowd Intelligence Enhances Automated Mobile Testing

Automated Software Engineering Conference (ASE)

We show that information extracted from crowdbased testing can enhance automated mobile testing. We introduce POLARIZ, which generates replicable test scripts from crowd-based testing, extracting cross-app ‘motif’ events: automatically-inferred reusable higher-level event sequences composed of lower-level observed event actions. Our empirical study used 434 crowd workers from Mechanical Turk to perform 1,350 testing tasks on 9 popular Google Play apps, each with at least 1 million user installs.

Ke Mao, Mark Harman, Yue Jia
October 25, 2017

DodecaPen: Accurate 6DoF Tracking of a Passive Stylus

ACM Symposium on User Interface Software and Technology (UIST)

We propose a system for real-time six degrees of freedom (6DoF) tracking of a passive stylus that achieves sub-millimeter accuracy, which is suitable for writing or drawing in mixed reality applications.

Po-Chen Wu, Robert Wang, Kenrick Kin, Christopher Twigg, Shangchen Han, Ming-Hsuan Yang, Shao-Yi Chien
October 22, 2017

Mask R-CNN

International Conference on Computer Vision (ICCV)

We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.

Kaiming He, Georgia Gkioxari, Piotr Dollar, Ross Girshick
October 22, 2017

Unsupervised Creation of Parameterized Avatars

International Conference on Computer Vision (ICCV)

We study the problem of mapping an input image to a tied pair consisting of a vector of parameters and an image that is created using a graphical engine from the vector of parameters. The mapping’s objective is to have the output image as similar as possible to the input image. During training, no supervision is given in the form of matching inputs and outputs.

Lior Wolf, Yaniv Taigman, Adam Polyak
October 22, 2017

Learning to Reason: End-to-End Module Networks for Visual Question Answering

International Conference on Computer Vision (ICCV)

Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems. For example, to answer “is there an equal number of balls and boxes?” we can look for balls, look for boxes, count them, and compare the results. The recently proposed Neural Module Network (NMN) architecture implements this approach to question answering by parsing questions into linguistic substructures and assembling question-specific deep networks from smaller modules that each solve one subtask.

Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Kate Saenko
October 22, 2017

Segmentation-Aware Convolutional Networks using Local Attention Masks

International Conference on Computer Vision (ICCV)

We introduce an approach to integrate segmentation information within a convolutional neural network (CNN). This counter-acts the tendency of CNNs to smooth information across regions and increases their spatial precision. To obtain segmentation information, we set up a CNN to provide an embedding space where region co-membership can be estimated based on Euclidean distance. We use these embeddings to compute a local attention mask relative to every neuron position.

Adam W. Harley, Konstantinos G. Derpanis
October 22, 2017

Focal Loss for Dense Object Detection

International Conference on Computer Vision (ICCV)

The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In this paper, we investigate why this is the case.

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollar
October 22, 2017

Low-shot Visual Recognition by Shrinking and Hallucinating Features

International Conference on Computer Vision (ICCV)

Low-shot visual learning—the ability to recognize novel object categories from very few examples—is a hallmark of human visual intelligence. Existing machine learning approaches fail to generalize in the same way. To make progress on this foundational problem, we present a lowshot learning benchmark on complex images that mimics challenges faced by recognition systems in the wild. We then propose (1) representation regularization techniques, and (2) techniques to hallucinate additional training examples for data-starved classes. Together, our methods improve the effectiveness of convolutional networks in low-shot learning, improving the one-shot accuracy on novel classes by 2.3× on the challenging ImageNet dataset.

Bharath Hariharan, Ross Girshick
October 22, 2017

Dense and Low-Rank Gaussian CRFs using Deep Embeddings

International Conference on Computer Vision (ICCV)

In this work we introduce a structured prediction model that endows the Deep Gaussian Conditional Random Field (G-CRF) with a […]

Siddhartha Chandra, Nicolas Usunier
October 22, 2017

Inferring and Executing Programs for Visual Reasoning

International Conference on Computer Vision (ICCV)

Inspired by module networks, this paper proposes a model for visual reasoning that consists of a program generator that constructs an explicit representation of the reasoning process to be performed, and an execution engine that executes the resulting program to produce an answer.

Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Li Fei-Fei, Larry Zitnick, Ross Girshick
October 22, 2017

Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

International Conference on Computer Vision (ICCV)

We introduce the first goal-driven training for visual question answering and dialog agents. Specifically, we pose a cooperative ‘image guessing’ game between two agents – Q-BOT and A-BOT– who communicate in natural language dialog so that Q-BOT can select an unseen image from a lineup of images. We use deep reinforcement learning (RL) to learn the policies of these agents end-to-end – from pixels to multi-agent multi-round dialog to game reward. We demonstrate two experimental results.

Abhishek Das, Satwik Kottur, José M.F. Moura, Stefan Lee, Dhruv Batra
October 22, 2017

Predicting Deeper into the Future of Semantic Segmentation

International Conference on Computer Vision (ICCV)

The ability to predict and therefore to anticipate the future is an important attribute of intelligence. It is also of utmost importance in real-time systems, e.g. in robotics or autonomous driving, which depend on visual scene understanding for decision making. While prediction of the raw RGB pixel values in future video frames has been studied in previous work, here we introduce the novel task of predicting semantic segmentations of future frames. Given a sequence of video frames, our goal is to predict segmentation maps of not yet observed video frames that lie up to a second or further in the future.

Pauline Luc, Natalia Neverova, Camille Couprie, Jakob Verbeek, Yann LeCun
October 22, 2017

Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training

International Conference on Computer Vision (ICCV)

While strong progress has been made in image captioning recently, machine and human captions are still quite distinct. This is primarily due to the deficiencies in the generated word distribution, vocabulary size, and strong bias in the generators towards frequent captions. Furthermore, humans – rightfully so – generate multiple, diverse captions, due to the inherent ambiguity in the captioning task which is not explicitly considered in today’s systems. To address these challenges, we change the training objective of the caption generator from reproducing ground-truth captions to generating a set of captions that is indistinguishable from human written captions.

Rakshith Shetty, Marcus Rohrbach, Lisa Anne Hendricks, Mario Fritz, Bernt Schiele
October 8, 2017

Living a Discrete Life in a Continuous World: Reference in Cross-Modal Entity Tracking

Proceedings of IWCS (12th International Conference on Computational Semantics)

This paper (a) introduces a concrete referential task to test both aspects, called cross-modal entity tracking; (b) proposes a neural network architecture that uses external memory to build an entity library inspired in the DRSs of DRT, with a mechanism to dynamically introduce new referents or add information to referents that are already in the library.

Gemma Boleda, Sebastian Pado', Nghia The Pham, Marco Baroni
October 5, 2017

STARDATA: a StarCraft AI Research Dataset

Association for the Advancement of Artificial Intelligence Digital Entertainment Conference

We release a dataset of 65646 StarCraft replays that contains 1535 million frames and 496 million player actions. We provide full game state data along with the original replays that can be viewed in StarCraft. The game state data was recorded every 3 frames which ensures suitability for a wide variety of machine learning tasks such as strategy classification, inverse reinforcement learning, imitation learning, forward modeling, partial information extraction, and others. We illustrate the diversity of the data with various statistics and provide examples of tasks that benefit from the dataset.

Zeming Lin, Jonas Gehring, Vasil Khalidov, Gabriel Synnaeve
September 24, 2017

Propagation of Joint Space Quantization Error to Operational Space Coordinates and Their Derivatives

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

In this paper, we characterize encoder error in a robotic system. Given encoder specifications, robot kinematics, and discrete transfer functions mapping coordinates to their derivatives, we describe the propagation of quantization error on joint space coordinates to operational space coordinates, joint space coordinate derivatives, and operational space coordinate derivatives.

Nick Colonnese, Allison M. Okamura
September 20, 2017

Characterizing Large-Scale Production Reliability for 100G Optical Interconnect in Facebook Data Centers Data Centers

Frontiers in Optics / Laser Science (FiO/LS)

Facebook is deploying cost effective 100G CWDM4 transceivers in data centers. This paper describes the post production performance monitoring system which is being implemented to identify optical interconnect early failure modes.

Abhijit Chakravarty, Srinivasan Giridharan, Matt Kelly, Ashwin Poojary, Vincent Zeng
September 20, 2017

100Gb/s CWDM4 Optical Interconnect at Facebook Data Centers for Bandwidth Enhancement

Frontiers in Optics / Laser Science (FiO/LS)

Facebook has developed 100G data centers from the ground-up by fine tuning optical technologies, optimizing link-budget, limiting operating temperatures and ultimately improving manufacturability. 100G-CWDM4 is an effective technology to enable connectivity over duplex single-mode fiber.

Abhijit Chakravarty, Katharine Schmidtke, Vincent Zeng, Srinivasan Giridharan, Cathie Deal, Reza Niazmand
September 9, 2017

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

Conference on Empirical Methods on Natural Language Processing (EMNLP)

In this paper, we show how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors (Kiros et al., 2015) on a wide range of transfer tasks.

Alexis Conneau, Douwe Kiela, Holger Schwenk, Loıc Barrault, Antoine Bordes
September 7, 2017

Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog

Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this paper, using a Task & Talk reference game between two agents as a testbed, we present a sequence of ‘negative’ results culminating in a ‘positive’ one – showing that while most agent-invented languages are effective (i.e. achieve near-perfect task rewards), they are decidedly not interpretable or compositional.

Satwik Kottur, José M.F. Moura, Stefan Lee, Dhruv Batra