July 14, 2015

Inferring Algorithmic Patterns with Stack

By: Armand Joulin, Tomas Mikolov

In recent years, machine learning has made a lot of progress in different application domains such as computer vision and speech recognition. Most of this success has been achieved by scaling up existing methods to leverage the information contained in big data. In particular, models tailored for sequential prediction have shown strong results in challenging natural language problems such as language modeling. With the recent success of deep learning, one can ask how far are we from developing a real artificial intelligence as shown in science fiction movies.

In our recent work, we show that certain simple sequential patterns cannot be learned by these popular deep learning approaches. For example, standard artificial neural networks cannot learn simple fundamental concepts such as memorization of sequence of symbols. Common machine learning approaches cannot learn these concepts as they rely mostly on remembering previously seen patterns frequently appearing in the training data.

We propose a novel sequence prediction approach which has the capability to learn these simple concepts. The models can grow in complexity in a learned, structured way, when facing more challenging tasks. This is achieved by extending neural network with a trainable memory structure that can capture and understand simple algorithms. Our model can be trained with standard continuous optimization techniques, such as stochastic gradient descent and error backpropagation.




The model includes a memory structure that is learned to operate in a way to solve the training tasks. In the case of a push-down stack memory, the model learns which values it should push on the top of the stack, and when it needs to pop (remove) the topmost elements from the stack. A simple extension is to use multiple stacks – while a model with just two stacks is already computationally equivalent to the Turing machine and thus able to represent any algorithm, it is often easier for the optimization process to train models which have many stacks. Extending neural networks with this simple structured memory is already sufficient to learn certain simple concepts such as counting and memorization.

In the paper, we also demonstrate what a non-trivial example of what the model could learn. The input sequence consists of binary numbers, where the third number is always sum of the previous two numbers.




Understanding this regularity requires complex use of several stacks in a very precise way. After the model is successfully trained, it can generalize to much longer sequences than those seen during the training – because it learns how to represent the algorithm.

There are several reasons why we are excited about these results. It should be noted that similar line of research has been pursued by several research groups already in the 90’s – however, the success was somewhat limited and the solutions often involved ad hoc tricks. It is possible that the main reason why we have succeeded is the much higher computational power of the current computers, which allowed us to train bigger models for much longer than what was possible twenty years ago.

The ability of the model to grow in complexity is an exciting property that many standard machine learning techniques, such as standard neural networks, do not possess. While it looks simple to a human to add longer binary numbers than what the machine was trained on, we believe this can go much further in the future. Since the machine can essentially learn how to operate its memory, it can learn how to program itself by producing sequences of instructions. A machine that can learn how to grow in complexity when facing new, more difficult problems, certainly seems to be important for the development of true AI. However, we are still clearly very far from achieving this ultimate goal.

To promote this line of research further, we are publishing the code that was used to produce the above results: https://github.com/facebook/Stack-RNN

More detailed description can be found in the paper: http://arxiv.org/pdf/1503.01007.pdf

Armand Joulin, Tomas Mikolov
Facebook Artificial Intelligence Research