152 Results
May 4, 2019
Quasi-Hyperbolic Momentum and Adam for Deep Learning
International Conference on Learning Representations (ICLR)
Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning. We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step. We describe numerous connections to and identities with other algorithms, and we characterize the set of two-state optimization algorithms that QHM can recover.
By: Jerry Ma, Denis YaratsFebruary 16, 2019
Machine Learning at Facebook: Understanding Inference at the Edge
IEEE International Symposium on High-Performance Computer Architecture (HPCA)
This paper takes a data-driven approach to present the opportunities and design challenges faced by Facebook in order to enable machine learning inference locally on smartphones and other edge platforms.
By: Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, Tommer Leyvand, Hao Lu, Yang Lu, Lin Qiao, Brandon Reagen, Joe Spisak, Fei Sun, Andrew Tulloch, Peter Vajda, Xiaodong Wang, Yanghan Wang, Bram Wasti, Yiming Wu, Ran Xian, Sungjoo Yoo, Peizhao ZhangJanuary 30, 2019
Large-Scale Visual Relationship Understanding
AAAI Conference on Artificial Intelligence (AAAI)
Large scale visual understanding is challenging, as it requires a model to handle the widely-spread and imbalanced distribution of
December 14, 2018
PyText: A seamless path from NLP research to production
We introduce PyText – a deep learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapid experimentation and of serving models at scale.
By: Ahmed Aly, Kushal Lakhotia, Shicong Zhao, Mrinal Mohit, Barlas Oğuz, Abhinav Arora, Sonal Gupta, Christopher Dewan, Stef Nelson-Lindall, Rushin ShahDecember 8, 2018
SGD Implicitly Regularizes Generalization Error
Integration of Deep Learning Theories Workshop at NeurIPS
We derive a simple and model-independent formula for the change in the generalization gap due to a gradient descent update. We then compare the change in the test error for stochastic gradient descent to the change in test error from an equivalent number of gradient descent updates and show explicitly that stochastic gradient descent acts to regularize generalization error by decorrelating nearby updates.
By: Daniel A. RobertsDecember 7, 2018
Bayesian Neural Networks using HackPPL with Application to User Location State Prediction
Bayesian Deep Learning Workshop at NeurIPS 2018
In this study, we present HackPPL as a probabilistic programming language in Facebook’s server-side language, Hack. One of the aims of our language is to support deep probabilistic modeling by providing a flexible interface for composing deep neural networks with encoded uncertainty and a rich inference engine.
By: Beliz Gokkaya, Jessica Ai, Michael Tingley, Yonglong Zhang, Ning Dong, Thomas Jiang, Anitha Kubendran, Aren KumarDecember 7, 2018
Causality in Physics and Effective Theories of Agency
Causal Learning Workshop at NeurIPS
We propose to combine reinforcement learning and theoretical physics to describe effective theories of agency. This involves understanding the connection between the physics notion of causality and how intelligent agents can arise as a useful effective description within some environments.
By: Daniel A. Roberts, Max Kleiman-WeinerDecember 7, 2018
Rethinking floating point for deep learning
Systems for Machine Learning Workshop at NeurIPS 2018
We improve floating point to be more energy efficient than equivalent bit width integer hardware on a 28 nm ASIC process while retaining accuracy in 8 bits with a novel hybrid log multiply/linear add, Kulisch accumulation and tapered encodings from Gustafson’s posit format.
By: Jeff JohnsonDecember 6, 2018
Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis
Neural Information Processing Systems (NeurIPS)
Optimization algorithms that leverage gradient covariance information, such as variants of natural gradient descent (Amari, 1998), offer the prospect of yielding more effective descent directions. For models with many parameters, the covariance matrix they are based on becomes gigantic, making them inapplicable in their original form.
By: Thomas George, Cesar Laurent, Xavier Bouthillier, Nicolas Ballas, Pascal VincentDecember 4, 2018
DeepFocus: Learned Image Synthesis for Computational Displays
ACM SIGGRAPH Asia 2018
In this paper, we introduce DeepFocus, a generic, end-to-end convolutional neural network designed to efficiently solve the full range of computational tasks for accommodation-supporting HMDs. This network is demonstrated to accurately synthesize defocus blur, focal stacks, multilayer decompositions, and multiview imagery using only commonly available RGB-D images, enabling real-time, near-correct depictions of retinal blur with a broad set of accommodation-supporting HMDs.
By: Lei Xiao, Anton Kaplanyan, Alexander Fix, Matthew Chapman, Douglas Lanman