A large-scale pipeline for LLM-assisted corpus annotation: variation and change in the English consider construction 文章

ArXiv CS.CL2026-06-16NEWSen作者: Cameron Morin, Matti Marttinen Larsson

详细信息

来源站点: ArXiv CS.CL
作者: Cameron Morin, Matti Marttinen Larsson
文章类型: NEWS
语言: en
发布日期: 2026-06-16

摘要

arXiv:2510.12306v3 Announce Type: replace Abstract: As natural language corpora expand at an unprecedented rate, manual annotation remains a significant methodological bottleneck in corpus linguistic work. We address this challenge by presenting a scalable pipeline for automating grammatical annotation in voluminous corpora using large language models (LLMs). Unlike previous supervised and iterative approaches, our method employs a four-phase workflow: prompt engineering, pre-hoc evaluation, automated batch processing, and post-hoc validation. We demonstrate the pipeline's accessibility and effectiveness through a diachronic case study of variation in the English evaluative consider construction (consider X as/to be/{\O} Y). We annotate 143,933 'consider' concordance lines from the Corpus of Historical American English (COHA) via the OpenAI API in under 60 hours, achieving 98%+ accuracy on two sophisticated annotation procedures.

A large-scale pipeline for LLM-assisted corpus annotation: variation and change in the English consider construction 文章

详细信息

摘要

相关事件

相关公司查看全部 (1)

相关人物

相关产品查看全部 (4)

相关技术查看全部 (2)