Use of clustering with partial least squares regression for predictions based on hyperspectral data
Visible and near-infrared (VNIR) diffuse reflectance spectroscopy (DRS) has proven to be effective tools of estimation of soil properties. Regression models are usually calibrated on the entire datasets without its stratification. This paper discusses how clustering of the soil spectra improves pred...
Saved in:
Published in: | 2014 6th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS) pp. 1 - 4 |
---|---|
Main Authors: | , , , , , , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-06-2014
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Visible and near-infrared (VNIR) diffuse reflectance spectroscopy (DRS) has proven to be effective tools of estimation of soil properties. Regression models are usually calibrated on the entire datasets without its stratification. This paper discusses how clustering of the soil spectra improves prediction of basic soil properties: contents of sand, clay, soil organic carbon (SOC) and total nitrogen, as well as the cation exchange capacity (CEC) and total exchangeable bases (TEB). The analysis was performed on a set of 212 soil samples collected from surface horizons throughout the area of arable lands in Poland. Spectral measurements were done using ASD Fildspec PRO with attached Source Probe Mug-Lite in the wavelength range of 350-2500 nm. First, partial least squares (PLS) regression models using the raw spectra and their first derivatives were calibrated on the entire dataset. Then, the observations (soil samples) were clustered using the Ward and K-mean methods based both on the raw and the transformed spectral data. The PLS regression modeling fitted separately within each cluster was performed. Our findings indicate that clustering is potentially useful to enhance the prediction of soil properties based on the DSR data when using the PLS regression modeling. In a practical application to a given set of soil samples, one needs to implement a two-step procedure recommended in this paper. In the first step, one runs a cross-validation analysis in order to identify the best combination of the spectra transformation, the type of the clustering method, and the number of clusters. In the second step, the best combination is applied to the whole dataset for prediction purposes. The improvement achieved by the described procedure ranges from 24 to 49 % reduction in the cross validation root mean squared error. |
---|---|
ISSN: | 2158-6276 |
DOI: | 10.1109/WHISPERS.2014.8077597 |