Assessing Classifiers from Two Independent Data Sets Using ROC Analysis: A Nonparametric Approach

This paper considers binary classification. We assess a classifier in terms of the area under the ROC curve (AUC). We estimate three important parameters, the conditional AUC (conditional on a particular training set) and the mean and variance of this AUC. We derive, as well, a closed form expressio...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on pattern analysis and machine intelligence Vol. 28; no. 11; pp. 1809 - 1817
Main Authors:	Yousef, W.A., Wagner, R.F., Loew, M.H.
Format:	Journal Article
Language:	English
Published:	Los Alamitos, CA IEEE 01-11-2006 IEEE Computer Society The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Algorithms Applied sciences Artificial Intelligence Classification Classifiers Cluster Analysis Computer science; control theory; systems Connectionism. Neural networks Databases, Factual Decision theory Estimates Estimators Exact sciences and technology Image Enhancement - methods Image Interpretation, Computer-Assisted - methods Information Storage and Retrieval - methods Mathematical analysis Mathematical models Medical diagnosis nonparametric statistics Parameter estimation Pattern Recognition, Automated - methods Probability density function Random variables ROC analysis ROC Curve Statistical analysis Statistical distributions Testing Training Training data Uncertainty Variance Data analysis Parameter estimation ROC analysis Classification Receiver operating characteristic curves Data distribution Pattern analysis Nonparametric statistics
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper considers binary classification. We assess a classifier in terms of the area under the ROC curve (AUC). We estimate three important parameters, the conditional AUC (conditional on a particular training set) and the mean and variance of this AUC. We derive, as well, a closed form expression of the variance of the estimator of the AUG. This expression exhibits several components of variance that facilitate an understanding for the sources of uncertainty of that estimate. In addition, we estimate this variance, i.e., the variance of the conditional AUC estimator. Our approach is nonparametric and based on general methods from U-statistics; it addresses the case where the data distribution is neither known nor modeled and where there are only two available data sets, the training and testing sets. Finally, we illustrate some simulation results for these estimators
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 ObjectType-Article-2 ObjectType-Feature-1
ISSN:	0162-8828 1939-3539 2160-9292
DOI:	10.1109/TPAMI.2006.218