Improving Topic Models with Latent Feature Word Representations 论文

2015Transactions of the Association for Computational Linguistics引用 321顶会

Topic ModelingText and Document Classification TechnologiesAdvanced Text Analysis Techniques

Topic Modeling Text and Document Classification Technologies Advanced Text Analysis Techniques

作者

摘要

Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two different Dirichlet multinomial topic models by incorporating latent feature vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus. Experimental results show that by using information from the external corpora, our new models produce significant improvements on topic coherence, document clustering and document classification tasks, especially on datasets with few or short documents.

作者查看全部 (4)

Mark Johnson

Lan Du

Richard Billingsley

Dat Quoc Nguyen

Improving Topic Models with Latent Feature Word Representations 论文

摘要

作者查看全部 (4)

相关技术查看全部 (2)

相关事件

相关文章