August 7, 2017

Machine learning insights from ICML 2017

By: Facebook Research

Facebook AI Researchers are presenting their latest research at the International Conference on Machine Learning (ICML) in Sydney Australia this week. ICML, the leading international conference on Machine Learning brings together researchers across industry and academia for an open international exchange of information.

In addition to presenting nine research papers that span various machine learning topics such as language modelling, optimization and unsupervised learning with images, the Facebook team is co-organizing the Video Games and Machine Learning (VGML) workshop.

Introducing WGAN

One paper of note, Wasserstein Generative Adversarial Networks, co-authored by Facebook and NYU researchers Martin Arjovsky, Soumith Chintala, and Leon Bottouu, introduces an alternative and improved approach to traditional GAN training named Wasserstein GAN.

Although Generative Adversarial Networks (GAN) have shown promising abilities for unsupervised learning problems, the GAN training algorithm is difficult to use in practice because of its unstable numerical convergence. In this paper, the researchers propose to replace the GAN objective function by a convenient approximation of the Earth Mover (EM) distance. This proposal is supported by an analysis of how the Earth Mover (EM) distance behaves in comparison to popular probability distances and divergences used in the context of learning distributions. The researchers then define the Wasserstein GAN (WGAN), a form of GAN that minimizes a convenient approximation of the EM distance and cures several known problems of the GAN training procedure.

Paper figure 2: Different methods learning a mixture of 8 gaussians spread in a circle. WGAN is able to learn the distribution without mode collapse. An interesting fact is that the WGAN (much like the Wasserstein distance) seems to capture first the low dimensional structure of the data before matching the specific bumps in the density.

A primary benefit of WGAN is that it allows researchers to train the critic till completion. When the critic is trained to completion, it provides a loss to the generator that can be trained as any other neural network. The better the critic, the higher quality the gradients are used to train the generator.  This eliminates the need to subtly balance how well the GAN generator and discriminator are trained. In particular, the researchers observed that WGANs are more robust than GANs when the architectural choices for the generator are varied in certain ways.

One of the most compelling practical benefits of WGANs is the ability to continuously estimate the EM distance by optimally training the critic. Since they correlate well with the observed sample quality, plotting the learning curves is very useful in terms of debugging and hyper-parameter searches.

Paper figure 4: Training curves and samples at different stages of training. We can see a clear correlation between lower error and better sample quality. Upper left: the generator is an MLP with 4 hidden layers and 512 units at each layer. The loss decreases consistently as training progresses and sample quality increases. Upper right: the generator is a standard DCGAN. The loss decreases quickly and sample quality increases as well. In both upper plots the critic is a DCGAN without the sigmoid so losses can be subjected to comparison. Lower half: both the generator and the discriminator are MLPs with substantially high learning rates (so training failed). Loss is constant and samples are constant as well. The training curves were passed through a median filter for visualization purposes.

You can learn more about WGAN at ICML or by reading the paper.

This is one highlight of a much larger body of work being presented by Facebook researchers at ICML. In addition to presenting papers, and our workshop presence, Facebook researchers are on hand at ICML to collaborate with the community, as we collectively seek to advance our scientific knowledge across the field of Machine Learning.

Facebook Papers at ICML

High-Dimensional Variance-Reduced Stochastic Gradient Expectation-Maximization Algorithm
Rongda Zhu, Lingxiao Wang, Chengxiang Zhai, Quanquan Gu

An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis
Yuandong Tian

Convolutional Sequence to Sequence Learning
Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann Dauphin

Efficient softmax approximation for GPUs
Edouard GraveArmand Joulin, Moustapha Cisse, David GrangierHervé Jégou

Gradient Boosted Decision Trees for High Dimensional Sparse Output
Si Si, Huan Zhang, Sathiya Keerthi, Dhruv Mahajan, Inderjit Dhillon, Cho-Jui Hsieh

Language Modeling with Gated Convolutional Networks
Yann DauphinAngela FanMichael AuliDavid Grangier

Parseval Networks: Improving Robustness to Adversarial Examples
Moustapha Cisse, Piotr Bojanowski, Edouard Grave

Unsupervised Learning by Predicting Noise
Piotr Bojanowski, Armand Joulin

Wasserstein Generative Adversarial Networks
Martin Arjovsky, Soumith ChintalaLeon Bottou


Video Games and Machine Learning (VGML) workshop, focuses on complex games that provide interesting and hard challenges for machine learning. The workshop aims to bring together attendees from the video game industry and machine learning (reinforcement learning) community, to benefit both communities.