Training, text genre and human-like parsing – Computational Neurolinguistics Lab

We have a new paper about RNNGs and EEG data from the DeepMind team (John Hale, Chris Dyer, Adhi Kuncoro), now also with Keith Hall at Google Research. Training the RNNG with larger corpora improves the fit with neural data, but only when the corpora come from the same genre as the stimulus used with people.

Hale, J. T., Kuncoro, A., Hall, K. B., Dyer, C., & Brennan, J. R., (2019). Text Genre and Training Data Size in Human-Like Parsing. Proceedings of Empirical Methods in Natural Language Processing

Read Paper Cite

Abstract

Domain-specific training typically makes NLP systems work better. We show that this extends to cognitive modeling as well by relating the states of a neural phrase-structure parser to electrophysiological measures from human participants. These measures were recorded as participants listened to a spoken recitation of the same literary text that was supplied as input to the neural parser. Given more training data, the system derives a better cognitive model — but only when the training examples come from the same textual genre. This finding is consistent with the idea that humans adapt syntactic expectations to particular genres during language comprehension (Kaan and Chun, 2018; Branigan and Pickering, 2017).