Name Tagging with Word Clusters and Discriminative Training 论文

2004引用 270

Topic ModelingNatural Language Processing TechniquesData Quality and Management

Natural Language Processing Techniques Topic Modeling Data Quality and Management

作者

摘要

We present a technique for augmenting annotated training data with hierarchical word clusters that are automatically derived from a large unannotated corpus. Cluster membership is encoded in features that are incorporated in a discriminatively trained tagging model. Active learning is used to select training examples. We evaluate the technique for named-entity tagging. Compared with a state-of-the-art HMM-based name finder, the presented technique requires only 13 % as much annotated data to achieve the same level of performance. Given a large annotated training set of 1,000,000 words, the technique achieves a 25 % reduction in error over the state-of-the-art HMM trained on the same material. 1

作者查看全部 (3)

Alex Zamanian

Jethran Guinness

S.L. Miller

Name Tagging with Word Clusters and Discriminative Training 论文

摘要

作者查看全部 (3)

相关技术查看全部 (2)

相关事件

相关文章