Efficient Softmax Approximation for GPUs

International Conference on Machine Learning (ICML)


We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the unbalanced word distribution to form clusters that explicitly minimize the expectation of computation time. Our approach further reduces the computational time by exploiting the specificities of modern architectures and matrix-matrix vector operations, making it particularly suited for graphical processing units. Our experiments carried out on standard benchmarks, such as EuroParl and One Billion Word, show that our approach brings a large gain in efficiency over standard approximations while achieving an accuracy close to that of the full softmax. The code of our method is available at facebookresearch/adaptive-softmax.

Related Publications

All Publications

EACL - April 18, 2021

Co-evolution of language and agents in referential games

Gautier Dagan, Dieuwke Hupkes, Elia Bruni

PPSN - September 2, 2020

Variance Reduction for Better Sampling in Continuous Domains

Laurent Meunier, Carola Doerr, Jeremy Rapin, Olivier Teytaud

ACL - May 2, 2021

MLQA: Evaluating Cross-lingual Extractive Question Answering

Patrick Lewis, Barlas Oğuz, Ruty Rinott, Sebastian Riedel, Holger Schwenk

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy