Bayesian model averaging to improve the yield prediction in wheat breeding trials

•Boruta algorithm reduced the dimensionality of hyperspectral data.•Bayesian model averaging outperformed individual machine learning models.•The diversity of ensemble members helps improve the accuracy of ensemble model. Accurate pre-harvest prediction of wheat yield through secondary traits helps...

Full description

Saved in:
Bibliographic Details
Published in:Agricultural and forest meteorology Vol. 328; p. 109237
Main Authors: Fei, Shuaipeng, Chen, Zhen, Li, Lei, Ma, Yuntao, Xiao, Yonggui
Format: Journal Article
Language:English
Published: Elsevier B.V 15-01-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Boruta algorithm reduced the dimensionality of hyperspectral data.•Bayesian model averaging outperformed individual machine learning models.•The diversity of ensemble members helps improve the accuracy of ensemble model. Accurate pre-harvest prediction of wheat yield through secondary traits helps to facilitate plant breeding and reduce costs. Machine learning (ML) algorithms are increasingly applied to grain yield with remote sensing data. However, the performance of individual ML algorithms varies for different species in different environments due to different sources of uncertainty. This study proposed a novel wheat yield prediction framework based on canopy hyperspectral reflectance (350–2500 nm) and adopted the ensemble Bayesian model averaging (EBMA) method to improve model performance. To develop the yield prediction models, important bands extracted by the Boruta feature selection method were fed into four linear ML models and four nonlinear ML models. Meanwhile, Bayesian model averaging (BMA) weights obtained based on model cross-validation performance were used to combine the predictions of individual ML models. Compared to the best-performing individual model, the EBMA models obtained a weak accuracy improvement by integrating only the linear models or the nonlinear models. Additionally, the integration of two linear models and two non-linear models simultaneously was analyzed. Results indicate that most EBMA combinations of mixed linear and non-linear models achieved higher prediction accuracy than those integrating a single type of model and the best-performing individual model. The advantage of the EBMA method is that it produces a prediction distribution that reflects the uncertainty associated with deterministic predictions. With full consideration of the model diversity of ensemble members, the EBMA modeling framework provides an alternative method for predicting grain yield in plant breeding trials.
ISSN:0168-1923
1873-2240
DOI:10.1016/j.agrformet.2022.109237