Positive-Unlabeled Learning using Random Forests via Recursive Greedy Risk Minimization
The need to learn from positive and unlabeled data, or PU learning, arises in many applications and has attracted increasing interest. While random forests are known to perform well on many tasks with positive and negative data, recent PU algorithms are generally based on deep neural networks, and t...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
16-10-2022
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The need to learn from positive and unlabeled data, or PU learning, arises in
many applications and has attracted increasing interest. While random forests
are known to perform well on many tasks with positive and negative data, recent
PU algorithms are generally based on deep neural networks, and the potential of
tree-based PU learning is under-explored. In this paper, we propose new random
forest algorithms for PU-learning. Key to our approach is a new interpretation
of decision tree algorithms for positive and negative data as \emph{recursive
greedy risk minimization algorithms}. We extend this perspective to the PU
setting to develop new decision tree learning algorithms that directly
minimizes PU-data based estimators for the expected risk. This allows us to
develop an efficient PU random forest algorithm, PU extra trees. Our approach
features three desirable properties: it is robust to the choice of the loss
function in the sense that various loss functions lead to the same decision
trees; it requires little hyperparameter tuning as compared to neural network
based PU learning; it supports a feature importance that directly measures a
feature's contribution to risk minimization. Our algorithms demonstrate strong
performance on several datasets. Our code is available at
\url{https://github.com/puetpaper/PUExtraTrees}. |
---|---|
DOI: | 10.48550/arxiv.2210.08461 |