What is CommAI?

CommAI is a project aiming at developing general-purpose artificial agents that are useful for humans in their daily endeavours. CommAI is also an evaluation framework developed with that goal in mind, where a learning agent must communicate with a scripted teacher in order to solve a never-ending stream of tasks.

The CommAI framework has the following distinguishing features (see Baroni et al. 2017 for a more in-depth discussion):

  • Language-first An AI will be useful to us only if we are able to communicate with it: assigning it tasks, understanding the information it returns, and teaching it new skills. Since natural language is by far the easiest way for us to communicate, we require our useful AI to be endowed with basic linguistic abilities. In this direction, tasks are presented to the learning algorithm in the form of linguistic expressions, and the learner proposes its solution, in turn, using linguistic expressions.
  • Reward-based feedback We should not have to explicitly inform the AI about every single action it must perform in order to solve a new task for us, or otherwise we might just as well do it ourselves. For this reason, we choose to provide a reward signal to the learner whenever it successfully completes the task rather than giving it explicit supervision on how it should have had solved it.
  • Life-long learning A useful AI should be flexible. As our needs change, the AI should help us with the new challenges we face: from solving a scientific problem in the morning at work to stocking our fridge at night. For this reason, rather than splitting the data to be learnt into train and test splits, as commonly done today in Machine Learning, we expose the learning algorithm to a continuous stream of tasks, ultimately evaluating it by the average cumulative reward it achieved.
  • General symbol-stream interface An AI will probably need to deal with many different input and output modalities. For this reason, the interface between the machine and the world should be maximally general. The machine itself should learn the best way to process different kinds of input and output streams, with no need for manual re-programming as we apply it to different domains. To achieve this, the learning algorithm communicates with the scripted teacher through a stream of bits or any other finite alphabet, exchanging one bit or symbol per time step.

To illustrate the nature of the framework with an example, consider a dialog where the learner is prompted with the task “repeat AB”, the learner correctly produces the output sequence “AB”, and then the teacher says “well done” followed by positive (+1) reward. Let’s suppose the instruction “repeat” is coded with a single symbol (#), A and B are also coded as symbols themselves (@ and $), and that both the learner and the teacher produce a “silence” space character while not emitting useful output. Then, this is how the task would look like from the learner point of view:

Teacher : #@$   %!-
Reward  : 000000001
Learner :     @$

In this case the learner produced the correct sequence and was consequently rewarded. Obviously though, except for some potentially useful inductive bias (the innate preference to make some sequences more likely than others), the learner has no principled way to discover the solution other than exhaustively searching the full space of possible sequences. While this approach is conceivable for short enough sequences, it quickly becomes intractable as the solution becomes longer and more complex. Imagine now that the learner is prompted with the following new task: “repeat AB two times”. Again, assuming that “two” and “times” are encoded with unique symbols, a learner would be prompted, for example, with a string looking like #@$*& (recall #@$ already meant “repeat AB”, while the new symbols *& could encode “two”+”times”). Now suppose that, after many attempts, the learner does discover the correct output @$@$. Then, the next time it is prompted with a similar input, say, for example, #@$^&, a fast learner could already exploit the hypothesis that the symbol # maps to a concept analogue to “repeat” or “copy” while the symbol & is a sort of marker indicating that the preceding symbol is a quantity. In this case, the ^ symbol stands for the number 5, and so the learner will at some point hit the correct solution:

Teacher : #@$^&         %!-
Reward  : 00000000000000001
Learner :     @$@$@$@$@$

Note how much the learner can now extrapolate from the observed information. For example, it can now associate the symbol * to the quantity 2 and ^ to 5, which could also be exploited in other tasks. Moreover, it could potentially learn that the sequence that encodes “well done” (%!-) is by itself denoting some reward signal, given that it only appears preceding the numerical reward (we have not seen examples where the learner gives the wrong output here, but you can imagine we won’t tell the learner “well done” in those situations). This latter association would enable the learner to develop an intrinsic reward mechanism to gain feedback from the linguistic input even in the absence of extrinsic reward!

The string repetition task might seem superficially very simple. However, when obfuscating the instructions by replacing known words with arbitrary symbol sequences -which is inevitably how the learner would see the English language in the beginning of its lifetime-, we see that understanding them is actually not such an easy feat. However, by exploring good hypotheses about the meaning of the symbols, we could manage to decode a growing number of them, and this should become easier with time as our knowledge increases. As a final note, we saw that the learning algorithm could learn to pick up linguistic cues to understand that its doing the right thing even without receiving any reward at all, which could potentially guide it on more complex tasks that could require some intermediate form of feedback.

The CommAI environment

To facilitate research under this framework we introduced CommAI-env, an environment where the experimenter can create datasets for life-long learning by scripting tasks that interact with the learner through a bidirectional communication channel. The teacher can be programmed in an arbitrary fashion though primitives that provide convenient abstractions to send messages to the learner, read back its responses and reward it appropriately. For their part, learners can be written in any programming language or ML paradigm and be evaluated for their ability to maximize the average reward. To find more about it, please visit our GitHub page:


Tomas Mikolov, Armand Joulin, and Marco BaroniA Roadmap towards machine intelligence. arXiv:1511.08130

Marco Baroni, Armand Joulin, Allan Jabri, Germán Kruszewski, Angeliki Lazaridou, Klemen Simonic, and Tomas Mikolov, CommAI: Evaluating the first steps towards a useful general AI. arXiv:1701.08954

User Group

User group for the CommAI-env platform

Contributing Researchers

Tomas Mikolov
Marco Baroni
Allan Jabri
Armand Joulin
Germán Kruszewski
Moustapha Cissé
Klemen Simonic
Amaç Herdagdelen

MAIN (MAchine INtelligence) Workshop Series


General AI Challenge

The “General AI Challenge” organized by GoodAI is based onCommAI.