Organizers: Jason Weston, Antoine Bordes, Sumit Chopra
Duration: one day (see format below)
Motivation and Objective of the Workshop
In order to solve AI, a key component is the use of long term dependencies as well as short term context during inference, i.e., the interplay of reasoning, attention and memory. The machine learning community has had great success in the last decades at solving basic prediction tasks such as text classification, image annotation and speech recognition. However, solutions to deeper reasoning tasks have remained elusive. Until recently, most existing machine learning models have lacked an easy way to read and write to part of a (potentially very large) long-term memory component, and to combine this seamlessly with inference. To combine memory with reasoning, a model must learn how to access it, i.e. to perform *attention* over its memory. Within the last year or so, in part inspired by some earlier works [8, 9, 14, 15, 16, 18, 19], there has been some notable progress in these areas which this workshop addresses. Models developing notions of attention [12, 5, 6, 7, 20, 21] have shown positive results on a number of real-world tasks such as machine translation and image captioning. There has also been a surge in building models of computation which explore differing forms of explicit storage [1, 10, 11, 13, 17]. For example, recently it was shown how to learn a model to sort a small set of numbers  as well as a host of other symbolic manipulation tasks. Another promising direction is work employing a large long-term memory for reading comprehension; the capability of somewhat deeper reasoning has been shown on synthetic data , and promising results are starting to appear on real data [3,4].
In spite of this resurgence, the research into developing learning algorithms combining these components and the analysis of those algorithms is still in its infancy. The purpose of this workshop is to bring together researchers from diverse backgrounds to exchange ideas which could lead to addressing the various drawbacks associated with such models leading to more interesting models in the quest for moving towards true AI. We thus plan to focus on addressing the following issues:
- How to decide what to write and what not to write in the memory.
- How to represent knowledge to be stored in memories.
- Types of memory (arrays, stacks, or stored within weights of model), when they should be used, and how can they be learnt.
- How to do fast retrieval of relevant knowledge from memories when the scale is huge.
- How to build hierarchical memories, e.g. employing multiscale notions of attention.
- How to build hierarchical reasoning, e.g. via composition of functions.
- How to incorporate forgetting/compression of information which is not important.
- How to properly evaluate reasoning models. Which tasks can have a proper coverage and also allow for unambiguous interpretation of systems’ capabilities? Are artificial tasks a convenient way?
- Can we draw inspiration from how animal or human memories are stored and used?
The workshop will devote most of the time in invited speaker talks, contributed talks and panel discussion. In order to move away from a mini-conference effect we will not have any posters. To encourage interaction a webpage will be employed for realtime updates, also allowing people to post questions before or during the workshop, which will be asked at the end of talks or during the panel, or can be answered online.
- Submission Deadline: Oct 9, 2015
- Notification of Acceptance: Oct 23, 2015
- Workshop: Dec 12th, 2015
Paper Submission Instructions
Authors are encouraged to submit papers on topics related to reasoning, memory and attention, strictly adhering to the following guidelines:
- The papers should be typeset according to NIPS format.
- The paper should not exceed more than 4 pages (including references).
- Submit to: firstname.lastname@example.org
- The authors of all the accepted papers will be expected to give a 20 minute talk (15 for the talk + 5 min for questions).
- Accepted papers will be displayed on the website.
- There will be no posters.
 Neural Turing Machines. Alex Graves, Greg Wayne, Ivo Danihelka. arXiv Pre-Print, 2014
 Memory Networks. Jason Weston, Sumit Chopra, Antoine Bordes. International Conference on Representation Learning, 2015
 Teaching Machines to Read and Comprehend. Karl Moritz Hermann et. al. arXiv Pre-Print, 2015.
 Large-scale Simple Question Answering with Memory Networks. Antoine Bordes, Nicolas Usunier, Sumit Chopra, Jason Weston. arXiv Pre-Print, 2015.
 Neural Machine Translation by Jointly Learning to Align and Translate. D. Bahdanau, K. Cho, Y. Bengio; International Conference on Representation Learning 2015.
 Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Kelvin Xu et. al.. arXiv Pre-Print, 2015.
 Attention-Based Models for Speech Recognition. Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, Yoshua Bengio. arXiv Pre-Print, 2015.
 Learning context-free grammars: Capabilities and limitations of a recurrent neural network with an external stack memory. S. Das, C. L. Giles, and G. Z. Sun. In ACCSS, 1992.
 Neural Net Architectures for Temporal Sequence Processing. Michael C Mozer. In Santa Fe Institute Studies in The Sciences of Complexity, volume 15.
 Inferring Algorithmic Patterns with Stack Augmented Recurrent Nets. Armand Joulin and Tomas Mikolov. arXiv Pre-Print, 2015.
 Reinforcement Learning Turing Machine. Wojciech Zaremba and Ilya Sutskever. arXiv Pre-Print, 2015.
 Generating sequences with recurrent neural networks. Alex Graves. arXiv preprint, 2013.
 End-To-End Memory Networks. S. Sukhbaatar, A. Szlam, J. Weston, R. Fergus. arXiv Pre-Print, 2015.
 Long short-term memory. Sepp Hochreiter, Jürgen Schmidhuber. Neural computation, 9(8): 1735-1780, 1997.
 Learning to control fast-weight memories: An alternative to dynamic recurrent networks. Jürgen Schmidhuber. Neural Computation, 4(1):131-139, 1992.
 A self-referential weight matrix. Jürgen Schmidhuber. In ICANN93, pp. 446-450. Springer, 1993.
 Ask Me Anything: Dynamic Memory Networks for Natural Language Processing. Kumar et al. arXiv Pre-Print, 2015
 Learning to combine foveal glimpses with a third-order boltzmann machine. Hugo Larochelle and Geoffrey E. Hinton. In NIPS, pp. 1243-1251, 2010.
 Learning where to attend with deep architectures for image tracking. Denil et. al. Neural Computation, 2012.
 Recurrent models of visual attention. V. Mnih, N. Hees, A. Graves and K. Kavukcuoglu. In NIPS, 2014.
 A Neural Attention Model for Abstractive Sentence Summarization. A. M. Rush, S. Chopra and J. Weston. EMNLP 2015.
7 Invited Speaker Talks – 35 min each (30 min + 5 min for questions)
6 Contributed Talks – 20 min each (15 min + 5 min for questions)
4 Contributed Lightning Talks – 5 min each
1 Panel Discussion Session – 35 min
8:20AM – 8:30AM : Introduction
8:30AM – 10:00AM : 2 Invited Talks, 1 Contributed Talk
10:30AM – 12:30PM : 2 Invited Talk, 2 Contributed Talks, 2 Lightning Talks
2:30PM – 4:30PM : 2 Invited Talks, 2 Contributed Talks, 2 Lightning Talks
5:00PM – 6:30PM : 1 Invited Talk, 1 Contributed Talk, 1 Panel Discussion
(There are no posters in the workshop.)
- Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences. Hongyuan Mei, TTI-Chicago; Mohit Bansal, TTI-Chicago; Matthew R. Walter, TTI-Chicago.
- Learning Deep Neural Network Policies with Continuous Memory States. Marvin Zhang, UC Berkeley; Zoe McCarthy, UC Berkeley; Chelsea Finn, UC Berkeley; Sergey Levine, UC Berkeley; Pieter Abbeel, UC Berkeley.
- Generating Images from Captions with Attention. Elman Mansimov, University of Toronto; Emilio Parisotto, University of Toronto; Jimmy Lei Ba, University of Toronto; Ruslan Salakhutdinov, University of Toronto.
- Neural Models for Simple Algorithmic Games. Sainbayar Sukhbaatar, Facebook AI Research; Arthur Szlam, Facebook AI Research; Rob Fergus, Facebook AI Research.
- Towards Neural Network-based Reasoning. Baolin Peng, The Chinese University of Hong Kong; Zhengdong Lu, Noah’s Ark Lab, Huawei Technologies; Hang Li, Noah’s Ark Lab, Huawei Technologies; Kam-Fai Wong, The Chinese University of Hong Kong.
- Structured Memory for Neural Turing Machines. Wei Zhang, IBM Watson; Yang Yu, IBM Watson; Bowen Zhou, IBM Watson.
- Chess Q&A : Question Answering on Chess Games. Volkan Cirik, CMU; Louis-Philippe Morency, CMU; Eduard Hovy, CMU.
- Evolving Neural Turing Machines. Rasmus Boll Greve, IT University of Copenhagen; Emil Juul Jacobsen, IT University of Copenhagen; Sebastian Risi, IT University of Copenhagen.
- Considerations for Evaluating Models of Language Understanding and Reasoning. Gabriel Recchia, University of Cambridge.
- Learning to learn neural networks. Tom Bosc, Inria.
Alex Graves, Google DeepMind
Alex Graves is a research scientist at Google DeepMind. His work focuses on developing recurrent neural networks for sequence learning, and now features prominently in areas such as speech recognition, handwriting synthesis, and generative sequence modelling. Alex has done a BSc in Theoretical Physics at Edinburgh, Part III Maths at Cambridge, a PhD in AI at IDSIA with Juergen Schmidhuber, followed by postdocs at TU-Munich and with Geoff Hinton at the University of Toronto. Most recently he has been spearheading DeepMind’s development of Neural Turing Machines.
Yoshua Bengio, University of Montreal
Yoshua Bengio received a PhD in Computer Science from McGill University, Canada in 1991. After two post-doctoral years, one at M.I.T. with Michael Jordan and one at AT&T Bell Laboratories with Yann LeCun and Vladimir Vapnik, he became professor at the Department of Computer Science and Operations Research at Université de Montréal. He is the author of two books and more than 200 publications, the most cited being in the areas of deep learning, recurrent neural networks, probabilistic learning algorithms, natural language processing and manifold learning. He is among the most cited Canadian computer scientists and is or has been associate editor of the top journals in machine learning and neural networks. Since ‘2000 he holds a Canada Research Chair in Statistical Learning Algorithms, since ‘2006 an NSERC Industrial Chair, since ‘2005 his is a Senior Fellow of the Canadian Institute for Advanced Research and since 2014 he co-directs its program focused on deep learning. He is on the board of the NIPS foundation and has been program chair and general chair for NIPS. He has co-organized the Learning Workshop for 14 years and co-created the new International Conference on Learning Representations. His current interests are centered around a quest for AI through machine learning, and include fundamental questions on deep learning and representation learning, the geometry of generalization in high-dimensional spaces, manifold learning, biologically inspired learning algorithms, and challenging applications of statistical machine learning.
Ilya Sutskever, Google Brain
Ilya Sutskever received his PhD in 2012 from the University of Toronto working with Geoffrey Hinton. After completing his PhD, he cofounded DNNResearch with Geoffrey Hinton and Alex Krizhevsky which was acquired by Google. He is interested in all aspects of neural networks and their applications.
Kyunghyun Cho, New York University
Kyunghyun Cho is an assistant professor at the Department of Computer Science, Courant Institute of Mathematical Sciences and the Center for Data Science of New York University (NYU). Before joining NYU on Sep 2015, he was a postdoctoral researcher at the University of Montreal under the supervision of Prof. Yoshua Bengio after obtaining the doctorate degree at Aalto University (Finland) early 2014. His main research interest includes neural networks, generative models and their applications, especially, to natural language understanding.
Mike Mozer, University of Colorado.
Michael Mozer received a Ph.D. in Cognitive Science at the University of California at San Diego in 1987. Following a postdoctoral fellowship with
Geoffrey Hinton at the University of Toronto, he joined the faculty at the University of Colorado at Boulder and is presently an Professor in the
Department of Computer Science and the Institute of Cognitive Science. He is secretary of the Neural Information Processing Systems Foundation and has served as chair of the Cognitive Science Society. His research involves developing computational models to help understand the mechanisms of cognition. He uses these models to build software that assists individuals in learning, remembering, and decision making.
Adrien Peyrache, New York University
After graduating in physics from ESPCI-ParisTech, Adrien Peyrache studied cognitive science in a joint MSc program at Pierre and Marie Curie University and Ecole Normale Supérieure, (Paris, France). In 2009, he completed his PhD in neuroscience at the Collège de France. His thesis focused on the neuronal substrate of sleep-dependent learning and memory. After a year of postdoctoral training at the CNRS (Gif-sur-Yvette, France) where he studied the coordination of neuronal activity during sleep, he moved four years ago to the laboratory of György Buzsaki at New York University Neuroscience Institute. Since then, he has devoted his work on leveraging the unique technical expertise in high density neuronal population recordings to characterize the self-organized mechanisms of neuronal activity in the navigation system.
Jürgen Schmidhuber, Swiss AI Lab IDSIA
Biography: Since age 15 or so, the main goal of professor Jürgen Schmidhuber (pronounce: You_again Shmidhoobuh) has been to build a self-improving Artificial Intelligence (AI) smarter than himself, then retire. He has pioneered self-improving general problem solvers since 1987, and Deep Learning Neural Networks (NNs) since 1991. The recurrent NNs (RNNs) developed by his research groups at the Swiss AI Lab IDSIA & USI & SUPSI & TU Munich were the first RNNs to win official international contests. They have revolutionized connected handwriting recognition, speech recognition, machine translation, optical character recognition, image caption generation, and are now in use at Google, Microsoft, IBM, Baidu, and many other companies. Founders & staff of DeepMind (sold to Google for over 600M) include 4 former PhD students from his lab. His team’s Deep Learners were the first to win object detection and image segmentation contests, and achieved the world’s first superhuman visual classification results, winning nine international competitions in machine learning & pattern recognition (more than any other team). They also were the first to learn control policies
directly from high-dimensional sensory input using reinforcement learning.
His research group also established the field of mathematically rigorous universal AI and optimal universal problem solvers. His formal theory of creativity & curiosity & fun explains art, science, music, and humor. He also generalized algorithmic information theory and the many-worlds theory of physics, and introduced the concept of Low-Complexity Art, the information age’s extreme form of minimal art. Since 2009 he has been member of the European Academy of Sciences and Arts. He has published 333 peer-reviewed papers, earned seven best paper/best video awards, the 2013 Helmholtz Award of the International Neural Networks Society, and the 2016 IEEE Neural Networks Pioneer Award. He is also president of NNAISENSE, which aims at building the first practical general purpose AI.
Jason Weston, Facebook AI Research (http://www.jaseweston.com/)
Jason Weston is a research scientist at Facebook, NY, since Feb 2014. He earned is PhD in machine learning at Royal Holloway, University of London and at AT&T Research in Red Bank, NJ (advisors: Alex Gammerman, Volodya Vovk and Vladimir Vapnik) in 2000. From 2000 to 2002, he was a researcher at Biowulf technologies, New York. From 2002 to 2003 he was a research scientist at the Max Planck Institute for Biological Cybernetics, Tuebingen, Germany. From 2003 to 2009 he was a research staff member at NEC Labs America, Princeton. From 2009 to 2014 he was a research scientist at Google, NY. His interests lie in statistical machine learning and its application to text, audio and images. Jason has published over 100 papers, including best paper awards at ICML and ECML. He was also part of the YouTube team that won a National Academy of Television Arts & Sciences Emmy Award for Technology and Engineering for Personalized Recommendation Engines for Video Discovery.
Antoine Bordes, Facebook AI Research (https://www.hds.utc.fr/~bordesan/)
Antoine Bordes is a staff research scientist at Facebook Artificial Intelligence Research. Prior to joining Facebook in 2014, he was a CNRS staff researcher in the Heudiasyc laboratory of the University of Technology of Compiegne in France. In 2010, he was a postdoctoral fellow in Yoshua Bengio’s lab of University of Montreal. He received his PhD in machine learning from Pierre & Marie Curie University in Paris in early 2010. From 2004 to 2009, he collaborated regularly with Léon Bottou at NEC Labs of America in Princeton. He received two awards for best PhD from the French Association for Artificial Intelligence and from the French Armament Agency, as well as a Scientific Excellence Scholarship awarded by CNRS in 2013. Antoine’s current interests cover knowledge bases/graphs modeling, natural language processing, deep learning and large scale learning.
Sumit Chopra, Facebook AI Research
Sumit Chopra is a research scientist at the Facebook Artificial Intelligence Research Lab. He graduated with a Ph.D., in computer science from New York University in 2008. His thesis proposed a first of its kind neural network model for doing relational regression, and was a conceptual foundation for a startup company for modeling residential real estate prices. Following his Ph.D., Sumit joined AT&T Labs – Research as a research scientist in the Statistics and Machine Learning Department, where he focused on building novel deep learning models for speech recognition, natural language processing, and computer vision. While at AT&T he also worked on other areas of machine learning, such as, recommender systems, computational advertisement, and ranking. He has been a research scientist at Facebook AI Research since April 2014, where he has been focusing primarily on natural language understanding.
There have been a series of “Learning Semantics” workshops over the last years which touch upon these subjects, but our workshop is more focused, which we hope will generate greater interaction and discussion.
Similarly, there have been a series of deep learning workshops over the years e.g. last year with the title “Deep Learning and Representation Learning”. Deep Learning is very broad and this year is the subject of a symposium. Our workshop focuses on a smaller area that has gained substantial interest (see references above).