Utilizing Nearest-Neighbor Clustering for Addressing Imbalanced Datasets in Bioengineering

Imbalance classification is common in scenarios like fault diagnosis, intrusion detection, and medical diagnosis, where obtaining abnormal data is difficult. This article addresses a one-class problem, implementing and refining the One-Class Nearest-Neighbor (OCNN) algorithm. The original inter-quar...

Full description

Saved in:

Bibliographic Details
Published in:	Bioengineering (Basel) Vol. 11; no. 4; p. 345
Main Authors:	Huang, Chih-Ming, Lin, Chun-Hung, Hung, Chuan-Sheng, Zeng, Wun-Hui, Zheng, You-Cheng, Tsai, Chih-Min
Format:	Journal Article
Language:	English
Published:	Switzerland MDPI AG 01-04-2024
Subjects:	Algorithms Bioengineering Classification Clustering Data analysis Data points Datasets Electrocardiography Fault diagnosis K-means with outlier removal (KMOR) Location-based Nearest Neighbor (LBNN) Medical research Methods Nearest-neighbor Nosology One-Class Nearest-Neighbor (OCNN) Outliers (statistics) Parameter identification Physiology Rare diseases Support vector machines Technology application Taiwan K-means with outlier removal (KMOR) One-Class Nearest-Neighbor (OCNN) Location-based Nearest Neighbor (LBNN)
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Imbalance classification is common in scenarios like fault diagnosis, intrusion detection, and medical diagnosis, where obtaining abnormal data is difficult. This article addresses a one-class problem, implementing and refining the One-Class Nearest-Neighbor (OCNN) algorithm. The original inter-quartile range mechanism is replaced with the K-means with outlier removal (KMOR) algorithm for efficient outlier identification in the target class. Parameters are optimized by treating these outliers as non-target-class samples. A new algorithm, the Location-based Nearest-Neighbor (LBNN) algorithm, clusters one-class training data using KMOR and calculates the farthest distance and percentile for each test data point to determine if it belongs to the target class. Experiments cover parameter studies, validation on eight standard imbalanced datasets from KEEL, and three applications on real medical imbalanced datasets. Results show superior performance in precision, recall, and G-means compared to traditional classification models, making it effective for handling imbalanced data challenges.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2306-5354 2306-5354
DOI:	10.3390/bioengineering11040345