A Novel Density-Based Approach for Instance Selection

Due to the increasing of the size of the datasets, techniques for instance selection have been applied for reducing the data to a manageable volume, leading to a reduction of the computational resources that are necessary for performing the learning process. Besides that, algorithms of instance sele...

Full description

Saved in:
Bibliographic Details
Published in:2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI) pp. 549 - 556
Main Authors: Carbonera, Joel Luis, Abel, Mara
Format: Conference Proceeding
Language:English
Published: IEEE 01-11-2016
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Due to the increasing of the size of the datasets, techniques for instance selection have been applied for reducing the data to a manageable volume, leading to a reduction of the computational resources that are necessary for performing the learning process. Besides that, algorithms of instance selection can also be applied for removing useless, erroneous or noisy instances, before applying learning algorithms. In the last years, several approaches for instance selection have been proposed. However, most of them have high time complexity and, due to this, they cannot be used for dealing with large datasets. In this paper, we present an algorithm called CDIS that can be viewed as an improvement of a recently proposed density-based approach for instance selection. The main contribution of this paper is a formal characterization of a novel density function that is adopted by the CDIS algorithm. The CDIS algorithm evaluates the instances of each class separately and keeps only the densest instances in a given (arbitrary) neighborhood. This ensures a reasonably low time complexity. Our approach was evaluated on 20 well-known data sets and its performance was compared with the performance of 6 state-of-the-art algorithms, considering three measures: accuracy, reduction and effectiveness. For evaluating the accuracy achieved using the datasets produced by the algorithms, we applied the KNN algorithm. The results show that our approach achieves a performance (in terms of balance of accuracy and reduction) that is better or comparable to the performances of the other algorithms considered in the evaluation.
ISSN:2375-0197
DOI:10.1109/ICTAI.2016.0090