Efficient Prequential AUC-PR Computation

When dealing with classification problems for data streams, we often need to compute the classification metrics in a prequential manner. The Area Under the Precision-Recall Curve (AUC-PR) metric is extensively used in imbalanced classification scenarios, where the negative class outnumbers the posit...

Full description

Saved in:
Bibliographic Details
Published in:2023 International Conference on Machine Learning and Applications (ICMLA) pp. 2222 - 2227
Main Authors: Pereira Gomes, David L., Gregio, Andre, Zanata Alves, Marco A., Lisboa de Almeida, Paulo R.
Format: Conference Proceeding
Language:English
Published: IEEE 15-12-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:When dealing with classification problems for data streams, we often need to compute the classification metrics in a prequential manner. The Area Under the Precision-Recall Curve (AUC-PR) metric is extensively used in imbalanced classification scenarios, where the negative class outnumbers the positive one. Despite its advantages, it may be computationally expensive to recompute that metric every time a new test instance becomes available. In this work, we present an efficient algorithm to compute the AUC-PR in a prequential way. Our proposed algorithm uses a self-balancing binary search tree to avoid the need to reorder the data when updating the AUC-PR value with the most recent data. Our experiments take into consideration six well-known, publicly available stream-based datasets. Our experiments show that our approach can be up to 13 times faster and use 12 times less energy than the traditional batch approach when considering a window of size 1,000.
ISSN:1946-0759
DOI:10.1109/ICMLA58977.2023.00335