GlossAssist -- A Tool to Simplify Corpus Creation and Study the Effect of NLP Models in Low-Resource Documentation Settings 文章

ArXiv CS.CL2026-06-04NEWSen作者: Bhargav Shandilya, Matt Buchholz, Alexis Palmer

详细信息

来源站点
ArXiv CS.CL
作者
Bhargav Shandilya, Matt Buchholz, Alexis Palmer
文章类型
NEWS
语言
en
发布日期
2026-06-04

摘要

arXiv:2606.04367v1 Announce Type: new Abstract: Interlinear glossed text (IGT) is the standard format for linguistic annotation in language documentation. Producing it manually, however, is often slow and costly. Automated glossing systems have improved substantially in recent years, but adoption among field linguists remains limited. Existing tools are designed to be evaluated rather than used, offering no interpretable path for correction or the incorporation of linguistic expertise back into model behavior. We present GlossAssist, a glossing tool built around the retrieval-based architecture of CWoMP (Contrastive Word-Morpheme Pre-training), which grounds predictions in a mutable lexicon of learned morpheme representations. In conjunction with CWoMP, our system treats each correction by an annotator as part of an active learning setting, which expands the lexicon and improves future predictions without having to retrain the model.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据