Evaluating and optimizing autonomous text classification systems 论文

1995引用 312
Machine Learning and AlgorithmsAlgorithms and Data CompressionMachine Learning and Data Classification

摘要

Text retrieval systems typically produce a ranking of documents and let a user decide how far down that ranking to go. In contrast, programs that filter text streams, software that categorizes documents, agents which alert users, and many other IR systems must make decisions without human input or supervision. It is important to define what constitutes good effectiveness for these autonomous systems, tune the systems to achieve the highest possible effectiveness, and estimate how the effectiveness changes as new data is processed. We show how to do this for binary text classification systems, emphasizing that different goals for the system lead to different optimal behaviors. Optimizing and estimating effectiveness is greatly aided if classifiers that explicitly estimate the probability of class membership are used. 1