Exploiting internal and external semantics for the clustering of short texts using world knowledge 论文

2009引用 250

Topic ModelingText and Document Classification TechnologiesSentiment Analysis and Opinion Mining

Topic Modeling Text and Document Classification Technologies Sentiment Analysis and Opinion Mining

作者

摘要

Clustering of short texts, such as snippets, presents great challenges in existing aggregated search techniques due to the problem of data sparseness and the complex semantics of natural language. As short texts do not provide sufficient term occurring information, traditional text representation methods, such as ``bag of words" model, have several limitations when directly applied to short texts tasks. In this paper, we propose a novel framework to improve the performance of short texts clustering by exploiting the internal semantics from original text and external concepts from world knowledge. The proposed method employs a hierarchical three-level structure to tackle the data sparsity problem of original short texts and reconstruct the corresponding feature space with the integration of multiple semantic knowledge bases -- Wikipedia and WordNet. Empirical evaluation with Reuters and real web dataset demonstrates that our approach is able to achieve significant improvement as compared to the state-of-the-art methods.

作者查看全部 (4)

Chao Zhang

Tat‐Seng Chua

Nan Sun

Xia Hu

Exploiting internal and external semantics for the clustering of short texts using world knowledge 论文

摘要

作者查看全部 (4)

相关技术查看全部 (3)

相关事件

相关文章