Training Set Construction for Genomic Prediction in Auto-Tetraploids: An Example in Potato

Training set construction is an important prerequisite to Genomic Prediction (GP), and while this has been studied in diploids, polyploids have not received the same attention. Polyploidy is a common feature in many crop plants, like for example banana and blueberry, but also potato which is the thi...

Full description

Saved in:
Bibliographic Details
Published in:Frontiers in plant science Vol. 12; p. 771075
Main Authors: Wilson, Stefan, Malosetti, Marcos, Maliepaard, Chris, Mulder, Han A, Visser, Richard G F, van Eeuwijk, Fred
Format: Journal Article
Language:English
Published: Switzerland Frontiers Media S.A 24-11-2021
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Training set construction is an important prerequisite to Genomic Prediction (GP), and while this has been studied in diploids, polyploids have not received the same attention. Polyploidy is a common feature in many crop plants, like for example banana and blueberry, but also potato which is the third most important crop in the world in terms of food consumption, after rice and wheat. The aim of this study was to investigate the impact of different training set construction methods using a publicly available diversity panel of tetraploid potatoes. Four methods of training set construction were compared: simple random sampling, stratified random sampling, genetic distance sampling and sampling based on the coefficient of determination (CDmean). For stratified random sampling, population structure analyses were carried out in order to define sub-populations, but since sub-populations accounted for only 16.6% of genetic variation, there were negligible differences between stratified and simple random sampling. For genetic distance sampling, four genetic distance measures were compared and though they performed similarly, Euclidean distance was the most consistent. In the majority of cases the CDmean method was the best sampling method, and compared to simple random sampling gave improvements of 4-14% in cross-validation scenarios, and 2-8% in scenarios with an independent test set, while genetic distance sampling gave improvements of 5.5-10.5% and 0.4-4.5%. No interaction was found between sampling method and the statistical model for the traits analyzed.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Edited by: Rodomiro Ortiz, Swedish University of Agricultural Sciences, Sweden
Reviewed by: Luis Felipe Ventorim Ferrão, University of Florida, United States; Marcio Resende, University of Florida, United States; John Edward Bradshaw, The James Hutton Institute, United Kingdom
This article was submitted to Plant Breeding, a section of the journal Frontiers in Plant Science
ISSN:1664-462X
1664-462X
DOI:10.3389/fpls.2021.771075