MLCM: Multi-Label Confusion Matrix 论文

2022IEEE Access引用 397顶会

Text and Document Classification TechnologiesMachine Learning in BioinformaticsSpam and Phishing Detection

人工智能 Text and Document Classification Technologies Machine Learning in Bioinformatics Spam and Phishing Detection

作者

摘要

Concise and unambiguous assessment of a machine learning algorithm is key to classifier design and performance improvement. In the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">multi-class classification task, where each instance can only be labeled as one class, the confusion matrix is a powerful tool for performance assessment by quantifying the classification overlap. However, in the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">multi-label classification task, where each instance can be labeled with more than one class, the confusion matrix is undefined. Performance assessment of the multi-label classifier is currently based on calculating performance averages, such as hamming loss, precision, recall, and F-score. While the current assessment techniques present a reasonable representation of each class and overall performance, their aggregate nature results in ambiguity when identifying false negative ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">FN ) and false positive ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">FP ) results. To address this gap, we define a method of creating the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">multi-label confusion matrix (MLCM) based on three proposed categories of multi-label problems. After establishing the shortcomings of current methods for identifying <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">FN and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">FP , we demonstrate the usage of the MLCM with the classification of two publicly available multi-label data sets: i) a 12-lead ECG data set with nine classes, and ii) a movie poster data set with eighteen classes. A comparison of the MLCM results against statistics from the current techniques is presented to show the effectiveness in providing a concise and unambiguous understanding of a multi-label classifier behavior.

作者查看全部 (3)

Reza Samavi

Thomas E. Doyle

Mohammadreza Heydarian

MLCM: Multi-Label Confusion Matrix 论文

摘要

作者查看全部 (3)

相关技术查看全部 (3)

相关事件

相关文章