Distribution-balanced stratified cross-validation for accuracy estimation 论文

2000Journal of Experimental & Theoretical Artificial Intelligence引用 237

Machine Learning and Data ClassificationImbalanced Data Classification TechniquesData Mining Algorithms and Applications

人工智能 Machine Learning and Data Classification Imbalanced Data Classification Techniques Data Mining Algorithms and Applications

关系图谱

作者

摘要

Abstract Cross-validation has often been applied in machine learning research for estimating the accuracies of classifiers. In this work, we propose an extension to this method, called distribution-balanced stratified cross-validation (DBSCV), which improves the estimation quality by providing balanced intraclass distributions when partitioning a data set into multiple folds. We have tested DBSCV on nine real-world and three artificial domains using the C4.5 decision trees classifier. The results show that DBSCV performs better (has smaller biases) than the regular stratified crossvalidationin most cases, especially when the number of folds is small. The analysis and experiments based on three artificial data sets also reveal that DBSCV is particularly effective when multiple intraclass clusters exist in a data set. Keywords: Cross-VALIDATION Machine Learning Research True Accuracy Classifier

作者查看全部 (2)

Tony Martinez

Xinchuan Zeng

Distribution-balanced stratified cross-validation for accuracy estimation 论文

摘要

作者查看全部 (2)

相关技术查看全部 (3)

相关事件

相关文章