Inter-rater variability and repeatability in the assessment of the Tanner–Whitehouse classification of hand radiographs for the estimation of bone age

Objective To determine which bones and which grades had the highest inter-rater variability when employing the Tanner–Whitehouse (T-W) method. Materials and methods Twenty-four radiologists were recruited and trained in the T-W classification of skeletal development. The consistency and skill of the...

Full description

Saved in:
Bibliographic Details
Published in:Skeletal radiology Vol. 53; no. 12; pp. 2635 - 2642
Main Authors: Geng, Jian, Zhang, Wenshuang, Ge, Yufeng, Wang, Ling, Huang, Pengju, Liu, Yandong, Shi, Jia, Zhou, Fengyun, Ma, Kangkang, Blake, Glen M., Xu, Gang, Yan, Dong, Cheng, Xiaoguang
Format: Journal Article
Language:English
Published: Berlin/Heidelberg Springer Berlin Heidelberg 01-12-2024
Springer Nature B.V
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Objective To determine which bones and which grades had the highest inter-rater variability when employing the Tanner–Whitehouse (T-W) method. Materials and methods Twenty-four radiologists were recruited and trained in the T-W classification of skeletal development. The consistency and skill of the radiologists in determining bone development status were assessed using 20 pediatric hand radiographs of children aged 1 to 18 years old. Four radiologists had a poor concordance rate and were excluded. The remaining 20 radiologists undertook a repeat reading of the radiographs, and their results were analyzed by comparing them with the mean assessment of two senior experts as the reference standard. Concordance rate, scoring, and Kendall’s W were calculated to evaluate accuracy and consistency. Results Both the radius, ulna, and short finger (RUS) system (Kendall’s W  = 0.833) and the carpal (C) system (Kendall’s W  = 0.944) had excellent consistency, with the RUS system outperforming the C system in terms of scores. The repeatability analysis showed that the second rating test, performed after 2 months of further bone age assessment (BAA) practice, was more consistent and accurate than the first. The capitate had the lowest average concordance rate and scoring, as well as the lowest overall concordance rate for its D classification. Moreover, the G classifications of the seven carpal bones all had a concordance rate less than 0.6. The bones with lower Kendall’s W were likewise those with lower scores and concordance rates. Conclusion The D grade of the capitate showed the highest variation, and the use of the Tanner–Whitehouse 3rd edition (T-W3) to determine bone age (BA) was frequently inconsistent. A more comprehensive description with a focus on inaccuracy bones or ratings and a modification to the T-W3 approach would significantly advance BAA.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0364-2348
1432-2161
1432-2161
DOI:10.1007/s00256-024-04664-w