Revisiting Classifier Two-Sample Tests for GAN Evaluation and Causal Discovery

International Conference on Learning Representations (ICLR) 2017


The goal of two-sample tests is to decide whether two probability distributions, denoted by P and Q, are equal. One alternative to construct flexible two-sample tests is to use binary classifiers. More specifically, pair n random samples drawn from P with a positive label, and pair n random samples drawn from Q with a negative label. Then, the test accuracy of a binary classifier on these data should remain near chance-level if the null hypothesis “P = Q” is true. Furthermore, such test accuracy is an average of independent random variables, and thus approaches a Gaussian null distribution. Furthermore, the prediction uncertainty of our binary classifier can be used to interpret the particular differences between P and Q. In particular, analyze which samples were correctly or incorrectly labeled by the classifier, with the least or most confidence.

In this paper, we aim to revive interest in the use of binary classifiers for two-sample testing. To this end, we review their fundamentals, previous literature on their use, compare their performance against alternative state-of-the-art two-sample tests, and propose them to evaluate generative adversarial network models applied to image synthesis.

As a by-product of our research, we propose the application of conditional generative adversarial networks, together with classifier two-sample tests, as an alternative to achieve state-of-the-art causal discovery.

Related Publications

All Publications

NeurIPS - December 5, 2021

Interpretable agent communication from scratch (with a generic visual processor emerging on the side)

Roberto Dessì, Eugene Kharitonov, Marco Baroni

Workshop on Online Abuse and Harms (WHOAH) at ACL - November 30, 2021

Findings of the WOAH 5 Shared Task on Fine Grained Hateful Memes Detection

Lambert Mathias, Shaoliang Nie, Bertie Vidgen, Aida Davani, Zeerak Waseem, Douwe Kiela, Vinodkumar Prabhakaran

Journal of Big Data - November 6, 2021

A graphical method of cumulative differences between two subpopulations

Mark Tygert

NeurIPS - December 6, 2021

Parallel Bayesian Optimization of Multiple Noisy Objectives with Expected Hypervolume Improvement

Samuel Daulton, Maximilian Balandat, Eytan Bakshy

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookie Policy