Strategies for choosing core animals in the algorithm for proven and young and their impact on the accuracy of single-step genomic predictions in cattle

•With more than 150 000 genotypes, the genomic relationship matrix cannot be inverted.•The algorithm for proven and young animals can be used to solve this problem.•Prediction accuracies increase as the core size increases.•Lowest accuracies when animals with greater contribution were included in th...

Full description

Saved in:

Bibliographic Details
Published in:	Animal (Cambridge, England) Vol. 17; no. 4; p. 100766
Main Authors:	Cesarani, A., Bermann, M., Dimauro, C., Degano, L., Vicario, D., Lourenco, D., Macciotta, N.P.P.
Format:	Journal Article
Language:	English
Published:	England Elsevier B.V 01-04-2023 Elsevier
Subjects:	Algorithms Animals Cattle - genetics Female Genome Genomic selection Genomics - methods Genotype Key individuals Models, Genetic Phenotype Prediction accuracy Principal component analysis Relationship matrix Prediction accuracy Key individuals Relationship matrix Genomic selection Principal component analysis
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	•With more than 150 000 genotypes, the genomic relationship matrix cannot be inverted.•The algorithm for proven and young animals can be used to solve this problem.•Prediction accuracies increase as the core size increases.•Lowest accuracies when animals with greater contribution were included in the core.•Results should be validated on larger genotyped populations. Nowadays, in some populations, the number of genotyped animals is too large to obtain the inverse of the genomic relationship matrix. The algorithm for proven and young animals (APY) can be used to overcome this problem. In the present work, different strategies for defining core animals in APY were tested using either simulated or real data. In particular, core definitions based on random choice or on the contribution to the genomic relationship matrix (GCONTR) calculated using Principal Component Analysis were tested. Core sizes able to explain 90, 95, 98, and 99% of the total variance of the genomic relationship matrix (G) were used. Analyzed phenotypes were three simulated traits for 3 000 individuals, and milkability records for 136 406 Italian Simmental cows. The number of genotypes was 4 100 for the simulated dataset, and 11 636 for the Simmental data, respectively. The GCONTR values in Simmental dataset were moderately correlated with the analyzed phenotype, and they showed a decreasing trend according to the year of birth of genotyped animals. The accuracy increased as the size of the core increased in both datasets. The inclusion in the core of animals with largest GCONTR values led to the lowest accuracies (0.50 and 0.71 for the simulated and Simmental datasets, respectively; average across traits and core sizes). On the contrary, the selection of animals with the lowest rank according to their contribution to the G provided slightly higher accuracies, especially in the simulated dataset (0.68 for the simulated dataset, and 0.76 for the Simmental data; average across traits and core sizes). In real data, particularly for larger sizes of core animals, the criteria of choice appear less important, confirming the results of earlier studies. Anyway, the inclusion in the core of animals with the lowest values of GCONTR led to increases in accuracy. These are preliminary results based on a small sample size that need to be confirmed on a larger number of genotypes.
ISSN:	1751-7311 1751-732X
DOI:	10.1016/j.animal.2023.100766