Comparison of Naive Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression Classifiers for Text Reviews Classification 论文
2017Baltic Journal of Modern Computing引用 311顶会
Text and Document Classification TechnologiesAdvanced Text Analysis TechniquesSpam and Phishing Detection
详细信息
- 发表期刊/会议
- Baltic Journal of Modern Computing
- 发表日期
- 2017-01-01
- 发表年份
- 2017
关键词
Text and Document Classification TechnologiesAdvanced Text Analysis TechniquesSpam and Phishing Detection
摘要
Today, a largely scalable computing environment provides a possibility of carrying out various data-intensive natural language processing and machine-learning tasks. One of these is text classification with some issues recently investigated by many data scientists. The authors of this paper investigate Nave Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression classifiers implemented in Apache Spark, i.e. the in-memory intensive computing platform. The focus of the paper is on comparing these classifiers by evaluating the classification accuracy, based on the size of training data sets, and the number of n-grams. In experiments, short texts for product-review data from Amazon 1 were analyzed.