Research Area
Year Published

158 Results

June 2, 2019

The emergence of number and syntax units in LSTM language models

Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)

We present here a detailed study of the inner mechanics of number tracking in LSTMs at the single neuron level. We discover that long-distance number information is largely managed by two “number units”.

By: Yair Lakretz, Germán Kruszewski, Theo Desbordes, Dieuwke Hupkes, Stanislas Dehaene, Marco Baroni

May 4, 2019

Quasi-Hyperbolic Momentum and Adam for Deep Learning

International Conference on Learning Representations (ICLR)

Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning. We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step. We describe numerous connections to and identities with other algorithms, and we characterize the set of two-state optimization algorithms that QHM can recover.

By: Jerry Ma, Denis Yarats

March 14, 2019

On the Pitfalls of Measuring Emergent Communication


In this paper, we examine a few intuitive existing metrics for measuring communication, and show that they can be misleading. Specifically, by training deep reinforcement learning agents to play simple matrix games augmented with a communication channel, we find a scenario where agents appear to communicate (their messages provide information about their subsequent action), and yet the messages do not impact the environment or other agent in any way.

By: Ryan Lowe, Jakob Foerster, Y-Lan Boureau, Joelle Pineau, Yann Dauphin

March 12, 2019

Convolutional neural networks for mesh-based parcellation of the cerebral cortex

Medical Imaging with Deep Learning (MIDL)

We show experimentally on the Human Connectome Project dataset that the proposed graph convolutional models outperform current state-of-the-art and baselines, highlighting the potential and applicability of these methods to tackle neuroimaging challenges, paving the road towards a better characterization of brain diseases.

By: Guillem Cucurull, Konrad Wagstyl, Arantxa Casanova, Petar Velickovic, Estrid Jakobsen, Michal Drozdzal, Adriana Romero, Alan Evans, Yoshua Bengio

February 16, 2019

Machine Learning at Facebook: Understanding Inference at the Edge

IEEE International Symposium on High-Performance Computer Architecture (HPCA)

This paper takes a data-driven approach to present the opportunities and design challenges faced by Facebook in order to enable machine learning inference locally on smartphones and other edge platforms.

By: Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, Tommer Leyvand, Hao Lu, Yang Lu, Lin Qiao, Brandon Reagen, Joe Spisak, Fei Sun, Andrew Tulloch, Peter Vajda, Xiaodong Wang, Yanghan Wang, Bram Wasti, Yiming Wu, Ran Xian, Sungjoo Yoo, Peizhao Zhang

January 30, 2019

Large-Scale Visual Relationship Understanding

AAAI Conference on Artificial Intelligence (AAAI)

Large scale visual understanding is challenging, as it requires a model to handle the widely-spread and imbalanced distribution of triples. In real-world scenarios with large numbers of objects and relations, some are seen very commonly while others are barely seen. We develop a new relationship detection model that embeds objects and relations into two vector spaces where both discriminative capability and semantic affinity are preserved.

By: Ji Zhang, Yannis Kalantidis, Marcus Rohrbach, Manohar Paluri, Ahmed Elgammal, Mohamed Elhoseiny

January 7, 2019

On the Dimensionality of Embeddings for Sparse Features and Data


In this note we discuss a common misconception, namely that embeddings are always used to reduce the dimensionality of the item space.

By: Maxim Naumov

December 14, 2018

PyText: A seamless path from NLP research to production

We introduce PyText – a deep learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapid experimentation and of serving models at scale.

By: Ahmed Aly, Kushal Lakhotia, Shicong Zhao, Mrinal Mohit, Barlas Oğuz, Abhinav Arora, Sonal Gupta, Christopher Dewan, Stef Nelson-Lindall, Rushin Shah

December 8, 2018

SGD Implicitly Regularizes Generalization Error

Integration of Deep Learning Theories Workshop at NeurIPS

We derive a simple and model-independent formula for the change in the generalization gap due to a gradient descent update. We then compare the change in the test error for stochastic gradient descent to the change in test error from an equivalent number of gradient descent updates and show explicitly that stochastic gradient descent acts to regularize generalization error by decorrelating nearby updates.

By: Daniel A. Roberts

December 7, 2018

Bayesian Neural Networks using HackPPL with Application to User Location State Prediction

Bayesian Deep Learning Workshop at NeurIPS 2018

In this study, we present HackPPL as a probabilistic programming language in Facebook’s server-side language, Hack. One of the aims of our language is to support deep probabilistic modeling by providing a flexible interface for composing deep neural networks with encoded uncertainty and a rich inference engine.

By: Beliz Gokkaya, Jessica Ai, Michael Tingley, Yonglong Zhang, Ning Dong, Thomas Jiang, Anitha Kubendran, Aren Kumar