SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization 论文

2019引用 327

Topic ModelingNatural Language Processing TechniquesSpeech and dialogue systems

Natural Language Processing Techniques Topic Modeling Speech and dialogue systems

作者

摘要

This paper introduces the SAMSum Corpus, a new dataset with abstractive dialogue summaries. We investigate the challenges it poses for automated summarization by testing several models and comparing their results with those obtained on a corpus of news articles. We show that model-generated summaries of dialogues achieve higher ROUGE scores than the model-generated summaries of news -in contrast with human evaluators' judgement. This suggests that a challenging task of abstractive dialogue summarization requires dedicated models and non-standard quality measures. To our knowledge, our study is the first attempt to introduce a high-quality chatdialogues corpus, manually annotated with abstractive summarizations, which can be used by the research community for further studies.

作者查看全部 (4)

Aleksander Wawer

Maciej Biesek

Iwona Mochol

Bogdan Gliwa

SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization 论文

摘要

作者查看全部 (4)

相关技术查看全部 (2)

相关事件

相关文章