Oil palm yield prediction across blocks from multi-source data using machine learning and deep learning

Crop yield estimates are affected by various factors including weather, nutrients and management practices. Predicting yields on a large scale in a timely and accurate manner by considering these factors is essential for preventing climate risk and ensuring food security, particularly in the light o...

Full description

Saved in:
Bibliographic Details
Published in:Earth science informatics Vol. 15; no. 4; pp. 2349 - 2367
Main Authors: Ang, Yuhao, Shafri, Helmi Zulhaidi Mohd, Lee, Yang Ping, Bakar, Shahrul Azman, Abidin, Haryati, Mohd Junaidi, Mohd Umar Ubaydah, Hashim, Shaiful Jahari, Che’Ya, Nik Norasma, Hassan, Mohd Roshdi, Lim, Hwee San, Abdullah, Rosni, Yusup, Yusri, Muhammad, Syahidah Akmal, Teh, Sin Yin, Samad, Mohd Na’aim
Format: Journal Article
Language:English
Published: Berlin/Heidelberg Springer Berlin Heidelberg 01-12-2022
Springer Nature B.V
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Crop yield estimates are affected by various factors including weather, nutrients and management practices. Predicting yields on a large scale in a timely and accurate manner by considering these factors is essential for preventing climate risk and ensuring food security, particularly in the light of climate change and the escalation of extreme climatic events. In this study, integrating multi-source data (i.e. satellite-derived vegetation indices (VIs), satellite-derived climatic variables (i.e. land surface temperature (LST) and rainfall precipitation, weather station and field-surveys), we built one multiple linear regression (MLR), three machine learning (XGBoost, support vector regression, and random forest) and one deep learning (deep neural network) models to predict oil palm yield at block-level within the oil palm plantation. Moreover, time-series moving average and backward elimination feature selection technique were implemented at the pre-processing stage. The yield prediction models were developed and tested using MLR, XGBoost, support vector regression (SVR), random forest (RF) and deep neural network (DNN) algorithms. Their model performances were then compared using evaluation metrics and generated the final spatial prediction map based on the best performance. DNN achieved the best model performances for both selected (R 2  = 0.91; RMSE = 2.92 t ha − 1 ; MAE = 2.56 t ha − 1 and MAPE = 0.09 t ha − 1 ) and full predictors (R 2  = 0.76; RMSE of 3.03 t ha − 1 ; MAE of 2.88 t ha − 1 ; MAPE of 0.10 t ha − 1 ). In addition, advanced ensemble machine learning (ML) techniques such as XGBoost may be utilised as a supplementary for oil palm yield prediction at the block level. Among them, MLR recorded the lowest performance. By using backward elimination to identify the most significant predictors, the performance of all models was improved by 5–26% for R 2 , and that decreased by 3–31% for RMSE, 7–34% for MAE, and 1–15% for MAPE. After backward elimination, the DNN achieved the highest prediction accuracy among the other models, with a 14% increase in R-squared, a 11% decrease in RMSE, a 32% decrease in MAE and a 1% decrease in MAPE. Our study successfully developed efficient and accurate yield prediction models for timely predicting oil palm yield over a large area by integrating data from multiple sources. These would be useful for plantation management estimating oil palm yields to speed up the decision-making process for sustainable production.
ISSN:1865-0473
1865-0481
DOI:10.1007/s12145-022-00882-9