Training with Low-precision Embedding Tables

Systems for Machine Learning Workshop at NeurIPS 2018


Starting from the success of Glove and Word2Vec in natural language processing, continuous representations are widely deployed in many other domain of applications. These applications span over encoding textual information to modeling user and items in recommender systems, using embedding vectors to represent a large number of objects. As the cardinality of the object sets increases, the embedding components quickly become the bottleneck in training memory footprint. In this work, we focus on building a system to train continuous embeddings in low precision floating point representation. Specifically, our system performs SGD-style model updates in single precision arithmetics, casts the updated parameters using stochastic rounding and stores the parameters in half-precision floating point. Theoretically, we prove that for strongly convex objectives, our SGD-based training algorithm retains the same convergence rate up to constants. We also present a system-friendly implementation for faster random number generator that increases runtime performance by 30%. We deploy our training system to deep neural networks with low precision embedding tables for recommender systems on top of both public dataset Criteo and an internal dataset at Facebook. We empirically demonstrate that our half-precision floating point training system can achieve generalization performance matching that of single precision training system, with up to 2X memory saving and 1.2X faster training speed.

Related Publications

All Publications

NAACL - June 6, 2021

Deep Learning on Graphs for Natural Language Processing

Lingfei Wu, Yu Chen, Heng Ji, Yunyao Li

ICASSP - June 6, 2021

On the Predictability of HRTFs from Ear Shapes Using Deep Networks

Yaxuan Zhou, Hao Jiang, Vamsi Krishna Ithapu

CoRL - December 1, 2020

Auxiliary Tasks Speed Up Learning PointGoal Navigation

Joel Ye, Dhruv Batra, Erik Wijmans, Abhishek Das

ACL - July 7, 2020

CraftAssist Instruction Parsing: Semantic Parsing for a Voxel-World Assistant

Kavya Srinet, Yacine Jernite, Jonathan Gray, Arthur Szlam

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy