LC-QuAD: A Corpus for Complex Question Answering over Knowledge Graphs 论文

2017Lecture notes in computer science引用 245
Topic ModelingAdvanced Graph Neural NetworksNatural Language Processing Techniques

详细信息

发表期刊/会议
Lecture notes in computer science
发表日期
2017-01-01
发表年份
2017

关键词

Topic ModelingAdvanced Graph Neural NetworksNatural Language Processing Techniques

摘要

Being able to access knowledge bases in an intuitive way has been an active area of research over the past years. In particular, several question answering (QA) approaches which allow to query RDF datasets in natural language have been developed as they allow end users to access knowledge without needing to learn the schema of a knowledge base and learn a formal query language. To foster this research area, several training datasets have been created, e.g. in the QALD (Question Answering over Linked Data) initiative. However, existing datasets are insufficient in terms of size, variety or complexity to apply and evaluate a range of machine learning based QA approaches for learning complex SPARQL queries. With the provision of the Large-Scale Complex Question Answering Dataset (LC-QuAD), we close this gap by providing a dataset with 5000 questions and their corresponding SPARQL queries over the DBpedia dataset. In this article, we describe the dataset creation process and how we ensure a high variety of questions, which should enable to assess the robustness and accuracy of the next generation of QA systems for knowledge graphs.