Parametrized linear regression for boxplot-multivalued data applied to the Brazilian Electric Sector

Symbolic boxplot data can be considered as a particular case of the numerical multi-valued variable. This kind of symbolic data is an useful exploratory tool with a simple structure for summarizing groups of numerical data. However, in the literature of symbolic data analysis it has been little expl...

Full description

Saved in:
Bibliographic Details
Published in:Information sciences Vol. 652; p. 119758
Main Authors: Reyes, Dailys M.A., Souza, Leandro C., de Souza, Renata M.C.R., de Oliveira, Adriano L.I.
Format: Journal Article
Language:English
Published: Elsevier Inc 01-01-2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Symbolic boxplot data can be considered as a particular case of the numerical multi-valued variable. This kind of symbolic data is an useful exploratory tool with a simple structure for summarizing groups of numerical data. However, in the literature of symbolic data analysis it has been little explored. In this paper, we propose a new prediction method for extracting knowledge from boxplot data. A parametrized regression approach automatically extracts the best reference points from the regressor variables. These reference points are then used to build five linear regression models based on values of the boxplot: minimum (m), lower quartile (Q1), median (Q2), upper quartile (Q3) and maximum (M). A strategy based on BoxCox transformation is applied to the response variable in order to guarantee the mathematical coherence of the predictions and build the boxplot. Experimental evaluation with synthetic and real boxplot datasets illustrates the advantages of the proposed method. Moreover, the present work also focuses in the development of an application for predicting temperature data based on boxplot in the Brazilian Electric Sector.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2023.119758