WordNet improves text document clustering 论文

2003引用 272

Natural Language Processing TechniquesText and Document Classification TechnologiesTopic Modeling

Natural Language Processing Techniques Topic Modeling Text and Document Classification Technologies

作者

摘要

Text document clustering plays an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters. The bag of words representation used for these clustering methods is often unsatisfactory as it ignores relationships between important terms that do not co-occur literally. In order to deal with the problem, we integrate background knowledge --- in our application Wordnet --- into the process of clustering text documents. We cluster the documents by a standard partitional algorithm. Our experimental evaluation on Reuters newsfeeds compares clustering results with pre-categorizations of news. In the experiments, improvements of results by background knowledge compared to the baseline can be shown for many interesting tasks.

作者查看全部 (3)

Gerd Stumme

Steffen Staab

Andreas Hotho

WordNet improves text document clustering 论文

摘要

作者查看全部 (3)

相关技术查看全部 (3)

相关事件

相关文章