Automated Problem Identification: Regression vs Classification via Evolutionary Deep Networks
Regression or classification? This is perhaps the most basic question faced when tackling a new supervised learning problem. We present an Evolutionary Deep Learning (EDL) algorithm that automatically solves this by identifying the question type with high accuracy, along with a proposed deep archite...
Saved in:
Main Authors: | , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
03-07-2017
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Regression or classification? This is perhaps the most basic question faced
when tackling a new supervised learning problem. We present an Evolutionary
Deep Learning (EDL) algorithm that automatically solves this by identifying the
question type with high accuracy, along with a proposed deep architecture.
Typically, a significant amount of human insight and preparation is required
prior to executing machine learning algorithms. For example, when creating deep
neural networks, the number of parameters must be selected in advance and
furthermore, a lot of these choices are made based upon pre-existing knowledge
of the data such as the use of a categorical cross entropy loss function.
Humans are able to study a dataset and decide whether it represents a
classification or a regression problem, and consequently make decisions which
will be applied to the execution of the neural network. We propose the
Automated Problem Identification (API) algorithm, which uses an evolutionary
algorithm interface to TensorFlow to manipulate a deep neural network to decide
if a dataset represents a classification or a regression problem. We test API
on 16 different classification, regression and sentiment analysis datasets with
up to 10,000 features and up to 17,000 unique target values. API achieves an
average accuracy of $96.3\%$ in identifying the problem type without hardcoding
any insights about the general characteristics of regression or classification
problems. For example, API successfully identifies classification problems even
with 1000 target values. Furthermore, the algorithm recommends which loss
function to use and also recommends a neural network architecture. Our work is
therefore a step towards fully automated machine learning. |
---|---|
DOI: | 10.48550/arxiv.1707.00703 |