Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English 论文

2013引用 346

Natural Language Processing TechniquesTopic ModelingText Readability and Simplification

Natural Language Processing Techniques Topic Modeling Text Readability and Simplification

作者

摘要

We describe the NUS Corpus of Learner English (NUCLE), a large, fully annotated corpus of learner English that is freely available for research purposes. The goal of the corpus is to provide a large data resource for the development and evaluation of grammatical error correction systems. Although NUCLE has been available for almost two years, there has been no reference paper that describes the corpus in detail. In this paper, we address this need. We describe the annotation schema and the data collection and annotation process of NUCLE. Most importantly, we report on an unpublished study of annotator agreement for grammatical error correction. Finally, we present statistics on the distribution of grammatical errors in the NUCLE corpus. 1

作者查看全部 (3)

Siew Mei Wu

Hwee Tou Ng

Daniel Dahlmeier

Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English 论文

摘要

作者查看全部 (3)

相关技术查看全部 (2)

相关事件

相关文章