A Novel Density-Based Approach for Instance Selection
Due to the increasing of the size of the datasets, techniques for instance selection have been applied for reducing the data to a manageable volume, leading to a reduction of the computational resources that are necessary for performing the learning process. Besides that, algorithms of instance sele...
Saved in:
Published in: | 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI) pp. 549 - 556 |
---|---|
Main Authors: | , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-11-2016
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Due to the increasing of the size of the datasets, techniques for instance selection have been applied for reducing the data to a manageable volume, leading to a reduction of the computational resources that are necessary for performing the learning process. Besides that, algorithms of instance selection can also be applied for removing useless, erroneous or noisy instances, before applying learning algorithms. In the last years, several approaches for instance selection have been proposed. However, most of them have high time complexity and, due to this, they cannot be used for dealing with large datasets. In this paper, we present an algorithm called CDIS that can be viewed as an improvement of a recently proposed density-based approach for instance selection. The main contribution of this paper is a formal characterization of a novel density function that is adopted by the CDIS algorithm. The CDIS algorithm evaluates the instances of each class separately and keeps only the densest instances in a given (arbitrary) neighborhood. This ensures a reasonably low time complexity. Our approach was evaluated on 20 well-known data sets and its performance was compared with the performance of 6 state-of-the-art algorithms, considering three measures: accuracy, reduction and effectiveness. For evaluating the accuracy achieved using the datasets produced by the algorithms, we applied the KNN algorithm. The results show that our approach achieves a performance (in terms of balance of accuracy and reduction) that is better or comparable to the performances of the other algorithms considered in the evaluation. |
---|---|
ISSN: | 2375-0197 |
DOI: | 10.1109/ICTAI.2016.0090 |