May 4, 2019
Quasi-Hyperbolic Momentum and Adam for Deep Learning
International Conference on Learning Representations (ICLR)
Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning. We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step. We describe numerous connections to and identities with other algorithms, and we characterize the set of two-state optimization algorithms that QHM can recover.
By: Jerry Ma, Denis Yarats
Facebook AI Research
Natural Language Processing & Speech