A preliminary study on the reuse of subtrees within decision trees in a genetic programming context for data classification

Genetic programming (GP) has been successful in creating models for data classification which obtain high accuracies. In a programming context creating functions is a common practice as this serves as a way to isolate a part of code which can be reused. The encapsulation genetic operator is capable...

Full description

Saved in:

Bibliographic Details
Published in:	2013 Third World Congress on Information and Communication Technologies (WICT 2013) pp. 285 - 290
Main Authors:	Dufourq, Emmanuel, Pillay, Nelishia
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01-12-2013
Subjects:	data classification data mining Encapsulation genetic programming Genetics Glass Iris Meteorology optimization Sonar
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Genetic programming (GP) has been successful in creating models for data classification which obtain high accuracies. In a programming context creating functions is a common practice as this serves as a way to isolate a part of code which can be reused. The encapsulation genetic operator is capable of promoting modularization in the sense that the operator can encapsulate subtrees which can be reused by GP trees during the execution of the algorithm. Models created for data classification problems tend to be large and of a certain complexity, and thus rendering the need for modular acquisition methods which promote the reuse of existing subtrees in order to solve the classification problems. The effect of the encapsulation operator for GP when solving data classification problems has not previously been investigated. Two approaches were proposed, the first incorporated the encapsulation operator with no limitations on how to use the encapsulated subtrees. The second approach made use of a maintained list of encapsulated subtrees. The two proposed methods were tested on eight data sets and the results show that the encapsulation operator improved the training accuracy on nearly every data set.
DOI:	10.1109/WICT.2013.7113150