Jump to better conclusions: SCAN both left and right

Empirical Methods in Natural Language Processing (EMNLP)


Lake and Baroni (2018) recently introduced the SCAN data set, which consists of simple commands paired with action sequences and is intended to test the strong generalization abilities of recurrent sequence-to-sequence models. Their initial experiments suggested that such models may fail because they lack the ability to extract systematic rules. Here, we take a closer look at SCAN and show that it does not always capture the kind of generalization that it was designed for. To mitigate this we propose a complementary dataset, which requires mapping actions back to the original commands, called NACS. We show that models that do well on SCAN do not necessarily do well on NACS, and that NACS exhibits properties more closely aligned with realistic use-cases for sequence-to-sequence models.

Related Publications

All Publications

CVPR - June 19, 2021

Efficient Object Embedding for Spliced Image Retrieval

Bor-Chun Chen, Zuxuan Wu, Larry S. Davis, Ser-Nam Lim

CVPR - June 19, 2021

On Feature Normalization and Data Augmentation

Boyi Li, Felix Wu, Ser-Nam Lim, Serge Belongie, Kilian Q. Weinberger

CVPR - June 18, 2021

NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go

Marvin Eisenberger, David Novotny, Gael Kerchenbaum, Patrick Labatut, Natalia Neverova, Daniel Cremers, Andrea Vedaldi

CVPR - June 18, 2021

Discovering Relationships between Object Categories via Universal Canonical Maps

Natalia Neverova, Artsiom Sanakoyeu, Patrick Labatut, David Novotny, Andrea Vedaldi

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy