Exploiting dictionaries in named entity extraction 论文

2004引用 225

Topic ModelingNatural Language Processing TechniquesData Quality and Management

Natural Language Processing Techniques Topic Modeling Data Quality and Management

作者

摘要

We consider the problem of improving named entity recognition (NER) systems by using external dictionaries---more specifically, the problem of extending state-of-the-art NER systems by incorporating information about the similarity of extracted entities to entities in an external dictionary. This is difficult because most high-performance named entity recognition systems operate by sequentially classifying words as to whether or not they participate in an entity name; however, the most useful similarity measures score entire candidate names. To correct this mismatch we formalize a semi-Markov extraction process, which is based on sequentially classifying segments of several adjacent words, rather than single words. In addition to allowing a natural way of coupling high-performance NER methods and high-performance similarity functions, this formalism also allows the direct use of other useful entity-level features, and provides a more natural formulation of the NER problem than sequential word classification. Experiments in multiple domains show that the new model can substantially improve extraction performance over previous methods for using external dictionaries in NER.

作者查看全部 (2)

Sunita Sarawagi

William W. Cohen

Exploiting dictionaries in named entity extraction 论文

摘要

作者查看全部 (2)

相关技术查看全部 (2)

相关事件

相关文章