AlphaNet: Improved Training of Supernets with Alpha-Divergence

International Conference on Machine Learning (ICML)


Weight-sharing neural architecture search (NAS) is an effective technique for automating efficient neural architecture design. Weight-sharing NAS builds a supernet that assembles all the architectures as its sub-networks and jointly trains the supernet with the sub-networks. The success of weight-sharing NAS heavily relies on distilling the knowledge of the supernet to the subnetworks. However, we find that the widely used distillation divergence, i.e., KL divergence, may lead to student sub-networks that overestimate or under-estimate the uncertainty of the teacher supernet, leading to inferior performance of the sub-networks. In this work, we propose to improve the supernet training with a more generalized α-divergence. By adaptively selecting the α-divergence, we simultaneously prevent the over-estimation or under-estimation of the uncertainty of the teacher model. We apply the proposed α-divergence based supernets training to both slimmable neural networks and weight-sharing NAS, and demonstrate significant improvements. Specifically, our discovered model family, AlphaNet, outperforms prior-art models on a wide range of FLOPs regimes, including BigNAS, Once-forAll networks, and AttentiveNAS. We achieve ImageNet top-1 accuracy of 80.0% with only 444M FLOPs. Our code and pretrained models are available at: facebookresearch/AlphaNet.

Related Publications

All Publications

CoNLL - November 9, 2021

Generalising to German Plural Noun Classes, from the Perspective of a Recurrent Neural Network

Verna Dankers, Anna Langedijk, Kate McCurdy, Adina Williams, Dieuwke Hupkes

EMNLP - October 1, 2021

Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little

Koustuv Sinha, Robin Jia, Dieuwke Hupkes, Joelle Pineau, Adina Williams, Douwe Kiela

IROS - September 30, 2021

Learning Navigation Skills for Legged Robots with Learned Robot Embeddings

Joanne Truong, Denis Yarats, Tianyu Li, Franziska Meier, Sonia Chernova, Dhruv Batra, Akshara Rai

IROS - September 27, 2021

Joint Sampling and Trajectory Optimization over Graphs for Online Motion Planning

Kalyan Vasudev Alwala, Mustafa Mukadam

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy