Research Area
Year Published

240 Results

November 10, 2019

Adversarial Bandits with Knapsacks

Symposium on Foundations of Computer Science (FOCS)

We consider Bandits with Knapsacks (henceforth, BwK), a general model for multi-armed bandits under supply/budget constraints. In particular, a bandit algorithm needs to solve a well-known knapsack problem: find an optimal packing of items into a limited-size knapsack. The BwK problem is a common generalization of numerous motivating examples, which range from dynamic pricing to repeated auctions to dynamic ad allocation to network routing and scheduling.

By: Nicole Immorlica, Karthik Abinav Sankararaman, Robert Schapire, Aleksandrs Slivkins

November 7, 2019

Feature Selection for Facebook Feed Ranking System via a Group-Sparsity-Regularized Training Algorithm

Conference on Information and Knowledge Management (CIKM)

In modern production platforms, large scale online learning models are applied to data of very high dimension. To save computational resource, it is important to have an efficient algorithm to select the most significant features from an enormous feature pool. In this paper, we propose a novel neural-network-suitable feature selection algorithm, which selects important features from the input layer during training.

By: Xiuyan Ni, Yang Yu, Peng Wu, Youlin Li, Shaoliang Nie, Qichao Que, Chao Chen

October 28, 2019

Unsupervised Pre-Training of Image Features on Non-Curated Data

International Conference on Computer Vision (ICCV)

Pre-training general-purpose visual features with convolutional neural networks without relying on annotations is a challenging and important task. Most recent efforts in unsupervised feature learning have focused on either small or highly curated datasets like ImageNet, whereas using non-curated raw datasets was found to decrease the feature quality when evaluated on a transfer task. Our goal is to bridge the performance gap between unsupervised methods trained on curated data, which are costly to obtain, and massive raw datasets that are easily available.

By: Mathilde Caron, Piotr Bojanowski, Julien Mairal, Armand Joulin

October 27, 2019

Improved Conditional VRNNs for Video Prediction

International Conference on Computer Vision (ICCV)

Predicting future frames for a video sequence is a challenging generative modeling task. Promising approaches include probabilistic latent variable models such as the Variational Auto-Encoder. While VAEs can handle uncertainty and model multiple possible future outcomes, they have a tendency to produce blurry predictions. In this work we argue that this is a sign of underfitting.

By: Lluís Castrejón, Nicolas Ballas, Aaron Courville

October 27, 2019

Video Classification with Channel-Separated Convolutional Networks

International Conference on Computer Vision (ICCV)

This paper studies the effects of different design choices in 3D group convolutional networks for video classification. We empirically demonstrate that the amount of channel interactions plays an important role in the accuracy of 3D group convolutional networks.

By: Du Tran, Heng Wang, Lorenzo Torresani, Matt Feiszli

October 27, 2019

SCSampler: Sampling Salient Clips from Video for Efficient Action Recognition

International Conference on Computer Vision (ICCV)

In this paper we introduce a lightweight “clip-sampling” model that can efficiently identify the most salient temporal clips within a long video. We demonstrate that the computational cost of action recognition on untrimmed videos can be dramatically reduced by invoking recognition only on these most salient clips. Furthermore, we show that this yields significant gains in recognition accuracy compared to analysis of all clips or randomly/uniformly selected clips.

By: Bruno Korbar, Du Tran, Lorenzo Torresani

October 27, 2019

DistInit: Learning Video Representations Without a Single Labeled Video

International Conference on Computer Vision (ICCV)

Video recognition models have progressed significantly over the past few years, evolving from shallow classifiers trained on hand-crafted features to deep spatiotemporal networks. However, labeled video data required to train such models has not been able to keep up with the ever increasing depth and sophistication of these networks. In this work we propose an alternative approach to learning video representations that requires no semantically labeled videos, and instead leverages the years of effort in collecting and labeling large and clean still-image datasets.

By: Rohit Girdhar, Du Tran, Lorenzo Torresani, Deva Ramanan

October 26, 2019

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

International Conference on Computer Vision (ICCV)

In natural images, information is conveyed at different frequencies where higher frequencies are usually encoded with fine details and lower frequencies are usually encoded with global structures. Similarly, the output feature maps of a convolution layer can also be seen as a mixture of information at different frequencies. In this work, we propose to factorize the mixed feature maps by their frequencies, and design a novel Octave Convolution (OctConv) operation to store and process feature maps that vary spatially “slower” at a lower spatial resolution reducing both memory and computation cost.

By: Yunpeng Chen, Haoqi Fan, Bing Xu, Zhicheng Yan, Yannis Kalantidis, Marcus Rohrbach, Shuicheng Yan, Jiashi Feng

September 30, 2019

Neural Code Search Evaluation Dataset


There has been an increase of interest in code search using natural language. Assessing the performance of such code search models can be difficult without a readily available evaluation suite. In this paper, we present an evaluation dataset of natural language query and code snippet pairs for future work. We also provide the results of two code search models ([6] and [1]) from recent work as a benchmark.

By: Hongyu Li, Seohyun Kim, Satish Chandra

September 17, 2019

Unsupervised Singing Voice Conversion


We present a deep learning method for singing voice conversion. The proposed network is not conditioned on the text or on the notes, and it directly converts the audio of one singer to the voice of another. Training is performed without any form of supervision: no lyrics or any kind of phonetic features, no notes, and no matching samples between singers.

By: Eliya Nachmani, Lior Wolf