The GENIA corpus: an annotated research abstract corpus in molecular biology domain 论文
2002引用 259
Biomedical Text Mining and OntologiesSemantic Web and OntologiesGenomics and Phylogenetic Studies
摘要
With the information overload in genome-related field, there is an infreest need for natural language processing technology to extract information from literature and various attempts of information extraction using NLP has been being made. We are developing the necessary resources including domain ontology and annotated corpus from research abstracts in MEDLINE database (GENIA corpus). We are building the ontology and the corpus simultaneously, using each other. In this paper we report on our new corpus, its ontological basis, annotation scheme, and statistics of annotated objects. We also describe the tools used for corpus annotation and management. 1.