Manual and automatic evaluation of summaries 论文

2002引用 265
Natural Language Processing TechniquesSemantic Web and OntologiesSpeech and dialogue systems

摘要

In this paper we discuss manual and automatic evaluations of summaries using data from the Document Understanding Conference 2001 (DUC-2001). We first show the instability of the manual evaluation. Specifically, the low inter-human agreement indicates that more reference summaries are needed. To investigate the feasibility of automated summary evaluation based on the recent BLEU method from machine translation, we use accumulative n-gram overlap scores between system and human summaries. The initial results provide encouraging correlations with human judgments, based on the Spearman rank-order correlation coefficient. However, relative ranking of systems needs to take into account the instability.

相关事件

暂无数据

相关文章

暂无数据