Bayesian model averaging to improve the yield prediction in wheat breeding trials
•Boruta algorithm reduced the dimensionality of hyperspectral data.•Bayesian model averaging outperformed individual machine learning models.•The diversity of ensemble members helps improve the accuracy of ensemble model. Accurate pre-harvest prediction of wheat yield through secondary traits helps...
Saved in:
Published in: | Agricultural and forest meteorology Vol. 328; p. 109237 |
---|---|
Main Authors: | , , , , |
Format: | Journal Article |
Language: | English |
Published: |
Elsevier B.V
15-01-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •Boruta algorithm reduced the dimensionality of hyperspectral data.•Bayesian model averaging outperformed individual machine learning models.•The diversity of ensemble members helps improve the accuracy of ensemble model.
Accurate pre-harvest prediction of wheat yield through secondary traits helps to facilitate plant breeding and reduce costs. Machine learning (ML) algorithms are increasingly applied to grain yield with remote sensing data. However, the performance of individual ML algorithms varies for different species in different environments due to different sources of uncertainty. This study proposed a novel wheat yield prediction framework based on canopy hyperspectral reflectance (350–2500 nm) and adopted the ensemble Bayesian model averaging (EBMA) method to improve model performance. To develop the yield prediction models, important bands extracted by the Boruta feature selection method were fed into four linear ML models and four nonlinear ML models. Meanwhile, Bayesian model averaging (BMA) weights obtained based on model cross-validation performance were used to combine the predictions of individual ML models. Compared to the best-performing individual model, the EBMA models obtained a weak accuracy improvement by integrating only the linear models or the nonlinear models. Additionally, the integration of two linear models and two non-linear models simultaneously was analyzed. Results indicate that most EBMA combinations of mixed linear and non-linear models achieved higher prediction accuracy than those integrating a single type of model and the best-performing individual model. The advantage of the EBMA method is that it produces a prediction distribution that reflects the uncertainty associated with deterministic predictions. With full consideration of the model diversity of ensemble members, the EBMA modeling framework provides an alternative method for predicting grain yield in plant breeding trials. |
---|---|
ISSN: | 0168-1923 1873-2240 |
DOI: | 10.1016/j.agrformet.2022.109237 |