Two papers on probing LLMs for linguistic interpretability

Linyang He (UMich MA 2024, now Columbia PhD) leads a pair of papers that extend methods for probing the internal states of large language models. Building on his earlier MA work, the key idea is to combine a linear decoder with carefully curated linguistic minimal pairs. These papers advance this approach to conceptual, along side grammatical, phenomena while also extending across multiple languages and language models.

The first paper introduces a cross-linguistic benchmark of conceptual minimal pairs.

He, L., Nie, E., Dindar, S. S., Firoozi, A., Florea, A., Nguyen, V., Puffay, C., Shimizu, R., Ye, H., Brennan, J., Schmid, H., Schütze, H., & Mesgarani, N. (2025, February 27). XCOMPS: A Multilingual Benchmark of Conceptual Minimal Pairs. SIGTYP2025. https://doi.org/10.48550/arXiv.2502.19737

Read Paper

The second paper compares the layerwise encoding of conceptual versus grammatical representations using linear decoding with minimal pairs, and compares this probing method to “behavior”-oriented approaches to LLM interpretability.

He, L., Nie, E., Schmid, H., Schütze, H., Mesgarani, N., & Brennan, J. (2025, February 25). Large Language Models as Neurolinguistic Subjects: Discrepancy in Performance and Competence for Form and Meaning. Findings of ACL2025.

Read Paper