Creating Training Corpora for NLG Micro-Planners 论文

2017引用 376

Natural Language Processing TechniquesTopic ModelingSpeech and dialogue systems

Natural Language Processing Techniques Topic Modeling Speech and dialogue systems

作者

摘要

In this paper, we present a novel framework for semi-automatically creating linguistically challenging microplanning data-to-text corpora from existing Knowledge Bases. Because our method pairs data of varying size and shape with texts ranging from simple clauses to short texts, a dataset created using this framework provides a challenging benchmark for microplanning. Another feature of this framework is that it can be applied to any large scale knowledge base and can therefore be used to train and learn KB verbalisers. We apply our framework to DBpedia data and compare the resulting dataset with Wen et al. (2016)'s. We show that while Wen et al.'s dataset is more than twice larger than ours, it is less diverse both in terms of input and in terms of text. We thus propose our corpus generation framework as a novel method for creating challenging data sets from which NLG models can be learned which are capable of handling the complex interactions occurring during in micro-planning between lexicalisation, aggregation, surface realisation, referring expression generation and sentence segmentation. To encourage researchers to take up this challenge, we recently made available a dataset created using this framework in the context of the WEBNLG shared task.

作者查看全部 (4)

Laura Perez-Beltrachini

Shashi Narayan

Anastasia Shimorina

Claire Gardent

Creating Training Corpora for NLG Micro-Planners 论文

摘要

作者查看全部 (4)

相关技术查看全部 (2)

相关事件

相关文章