Comparison of Naive Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression Classifiers for Text Reviews Classification 论文

2017Baltic Journal of Modern Computing引用 311顶会
Text and Document Classification TechnologiesAdvanced Text Analysis TechniquesSpam and Phishing Detection

详细信息

发表期刊/会议
Baltic Journal of Modern Computing
发表日期
2017-01-01
发表年份
2017

关键词

Text and Document Classification TechnologiesAdvanced Text Analysis TechniquesSpam and Phishing Detection

摘要

Today, a largely scalable computing environment provides a possibility of carrying out various data-intensive natural language processing and machine-learning tasks. One of these is text classification with some issues recently investigated by many data scientists. The authors of this paper investigate Nave Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression classifiers implemented in Apache Spark, i.e. the in-memory intensive computing platform. The focus of the paper is on comparing these classifiers by evaluating the classification accuracy, based on the size of training data sets, and the number of n-grams. In experiments, short texts for product-review data from Amazon 1 were analyzed.