A Statistical Learning Model of Text Classification for Support Vector Machines 论文

2001引用 239
Statistical and Computational ModelingText and Document Classification TechnologiesNeural Networks and Applications

详细信息

发表日期
2001-01-01
发表年份
2001

关键词

Statistical and Computational ModelingText and Document Classification TechnologiesNeural Networks and Applications

摘要

This paper develops a theoretical learning model of text classification for Support Vector Machines (SVMs). It connects the statistical properties of text-classification tasks with the generalization performance of a SVM in a quantitative way. Unlike conventional approaches to learning text classifiers, which rely primarily on empirical evidence, this model explains why and when SVMs perform well for text classification. In particular, it addresses the following questions: Why can support vector machines handle the large feature spaces in text classification effectively? How is this related to the statistical properties of text? What are sufficient conditions for applying SVMs to text-classification problems successfully?