On biases in estimating multi-valued attributes 论文
1995引用 258
Rough Sets and Fuzzy LogicNeural Networks and ApplicationsData Mining Algorithms and Applications
摘要
We analyse the biases of eleven measures for estimating the quality of the multi-valued attributes. The values of information gain, J-measure, gini-index, and relevance tend to linearly increase with the number of values of an attribute. The values of gain-ratio, distance measure, Relief, and the weight of evidence decrease for informative attributes and increase for irrelevant attributes. The bias of the statistic tests based on the chi-square distribution is similar but these functions are not able to discriminate among the attributes of different quality. We also introduce a new function based on the MDL principle whose value slightly decreases with the increasing number of attribute’s values. 1