Value-aware Quantization for Training and Inference of Neural Networks

European Conference on Computer Vision (ECCV)


We propose a novel value-aware quantization which applies aggressively reduced precision to the majority of data while separately handling a small amount of large values in high precision, which reduces total quantization errors under very low precision. We present new techniques to apply the proposed quantization to training and inference. The experiments show that our method with 3-bit activations (with 2% of large ones) can give the same training accuracy as full-precision one while offering significant (41.6% and 53.7%) reductions in the memory cost of activations in ResNet-152 and Inception-v3 compared with the state-of-the-art method. Our experiments also show that deep networks such as Inception-v3, ResNet-101 and DenseNet-121 can be quantized for inference with 4-bit weights and activations (with 1% 16-bit data) within 1% top-1 accuracy drop.

Related Publications

All Publications

Unsupervised Translation of Programming Languages

Baptiste Roziere, Marie-Anne Lachaux, Lowik Chanussot, Guillaume Lample

NeurIPS - December 1, 2020

Learning Reasoning Strategies in End-to-End Differentiable Proving

Pasquale Minervini, Sebastian Riedel, Pontus Stenetorp, Edward Grefenstette, Tim Rocktäschel

ICML - August 13, 2020

Voice Separation with an Unknown Number of Multiple Speakers

Eliya Nachmani, Yossi Adi, Lior Wolf

ICML - October 1, 2020

Constraining Dense Hand Surface Tracking with Elasticity

Breannan Smith, Chenglei Wu, He Wen, Patrick Peluse, Yaser Sheikh, Jessica Hodgins, Takaaki Shiratori

SIGGRAPH Asia - December 1, 2020

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy