Importance of semantic representation: dataless classification 论文

2008National Conference on Artificial Intelligence引用 217

Topic ModelingText and Document Classification TechnologiesNatural Language Processing Techniques

Natural Language Processing Techniques Topic Modeling Text and Document Classification Technologies

作者

摘要

Traditionally, text categorization has been studied as the problem of training of a classifier using labeled data. However, people can categorize documents into named categories without any explicit training because we know the meaning of category names. In this paper, we introduce Dataless Classification, a learning protocol that uses world knowledge to induce classifiers without the need for any labeled data. Like humans, a dataless classifier interprets a string of words as a set of semantic concepts. We propose a model for dataless classification and show that the label name alone is often sufficient to induce classifiers. Using Wikipedia as our source of world knowledge, we get 85.29% accuracy on tasks from the 20 Newsgroup dataset and 88.62% accuracy on tasks from a Yahoo! Answers dataset without any labeled or unlabeled data from the data sets. With unlabeled data, we can further improve the results and show quite competitive performance to a supervised learning algorithm that uses 100 labeled examples.

作者查看全部 (4)

Vivek Srikumar

Dan Roth

Lev Ratinov

Ming‐Wei Chang

Importance of semantic representation: dataless classification 论文

摘要

作者查看全部 (4)

相关技术查看全部 (3)

相关事件

相关文章