Determination of gross calorific value in crude oil by variable selection methods applied to 13C NMR spectroscopy

•13C NMR was used to estimar gross calorific value.•Ten variable selection metods were applied in 13C NMR to estimate GCV.•PSO-PLS model had better performance than PLS with full spectra.•Particle swarm optimization (PSO) presented the highest accuracy.•PSO selected 701 variables in the paraffinic r...

Full description

Saved in:
Bibliographic Details
Published in:Fuel (Guildford) Vol. 311; p. 122527
Main Authors: de Paulo, Ellisson H., dos Santos, Francine D., Folli, Gabriely S., Santos, Layla P., Nascimento, Márcia H.C., Moro, Mariana K., da Cunha, Pedro H.P., Castro, Eustáquio V.R., Cunha Neto, Alvaro, Filgueiras, Paulo R.
Format: Journal Article
Language:English
Published: Kidlington Elsevier Ltd 01-03-2022
Elsevier BV
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•13C NMR was used to estimar gross calorific value.•Ten variable selection metods were applied in 13C NMR to estimate GCV.•PSO-PLS model had better performance than PLS with full spectra.•Particle swarm optimization (PSO) presented the highest accuracy.•PSO selected 701 variables in the paraffinic region to estimate GCV. Gross Calorific Value (GCV) is one of the properties to assess the quality and value of fuel in the oil industry, but the standard method is laborious. Regression models built with nuclear magnetic resonance (13C NMR) data make it possible to estimate different physicochemical properties of oils. However, its adversity is the enormous amount of chemical information produced in a single spectrum. And not all variables contribute in the model. With variable selection methods (VS), we can find the information that has the highest correlation with the property of interest. In our study, we used different methods applied to 13C NMR data of 145 Brazilian crude oil samples with GCV ranging from 41.5 to 47 MJ∙kg−1. For variable selection we used genetic algorithm (GA), variable importance in projection (VIP), uninformative variable elimination (UVE), angular search algorithm with variance inflation factor (ASA-VIF), competitive adaptive reweighted sampling (CARS), synergy interval partial least squares (siPLS), interval partial least squares (iPLS), subwindow permutation analysis (SPA), ordered predictors selection (OPS) and particle swarm optimization (PSO). Also, we used orthogonal projection to latent structures (OPLS) and partial least squares (PLS) with full spectra (PLS-full). All models using variable selection obtained lower root mean squared error of prediction (RMSEP) compared to the model based on full range (PLS-full). PSO was the most accurate model with RMSEP of 0.152 MJ∙kg−1. PSO selected the paraffinic region (0 to 4.43 ppm), totalizing 701 variables. Statistical analysis showed no trends in residues above 5% of significance for the best models.
ISSN:0016-2361
1873-7153
DOI:10.1016/j.fuel.2021.122527