Tabula nearly rasa: Probing the linguistic knowledge of character-level neural language models trained on unsegmented text

Topology, Algebra and Categories in Logic (TACL)


Recurrent neural networks (RNNs) have reached striking performance in many natural language processing tasks. This has renewed interest in whether these generic sequence processing devices are inducing genuine linguistic knowledge. Nearly all current analytical studies, however, initialize the RNNs with a vocabulary of known words, and feed them tokenized input during training. We present a multi-lingual study of the linguistic knowledge encoded in RNNs trained as character-level language models, on input data with word boundaries removed. These networks face a tougher and more cognitively realistic task, having to discover any useful linguistic unit from scratch based on input statistics. The results show that our “near tabula rasa” RNNs are mostly able to solve morphological, syntactic and semantic tasks that intuitively pre-suppose word-level knowledge, and indeed they learned, to some extent, to track word boundaries. Our study opens the door to speculations about the necessity of an explicit, rigid word lexicon in language learning and usage.

Related Publications

All Publications

SIGGRAPH - August 9, 2021

Control Strategies for Physically Simulated Characters Performing Two-player Competitive Sports

Jungdam Won, Deepak Gopinath, Jessica Hodgins

CVPR - June 20, 2021

Ego-Exo: Transferring Visual Representations from Third-person to First-person Videos

Yanghao Li, Tushar Nagarajan, Bo Xiong, Kristen Grauman

ICML - July 18, 2021

Align, then memorise: the dynamics of learning with feedback alignment

Maria Refinetti, Stéphane d'Ascoli, Ruben Ohana, Sebastian Goldt

CVPR - June 19, 2021

Intentonomy: a Dataset and Study towards Human Intent Understanding

Menglin Jia, Zuxuan Wu, Austin Reiter, Claire Cardie, Serge Belongie, Ser-Nam Lim

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy