Using Machine Learning to Evaluate Attending Feedback on Resident Performance

BACKGROUND:High-quality and high-utility feedback allows for the development of improvement plans for trainees. The current manual assessment of the quality of this feedback is time consuming and subjective. We propose the use of machine learning to rapidly distinguish the quality of attending feedb...

Full description

Saved in:
Bibliographic Details
Published in:Anesthesia and analgesia Vol. 132; no. 2; pp. 545 - 555
Main Authors: Neves, Sara E., Chen, Michael J., Ku, Cindy M., Karan, Suzanne, DiLorenzo, Amy N., Schell, Randall M., Lee, Daniel E., Diachun, Carol Ann B., Jones, Stephanie B., Mitchell, John D.
Format: Journal Article
Language:English
Published: United States Lippincott Williams & Wilkin 01-02-2021
International Anesthesia Research Society
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:BACKGROUND:High-quality and high-utility feedback allows for the development of improvement plans for trainees. The current manual assessment of the quality of this feedback is time consuming and subjective. We propose the use of machine learning to rapidly distinguish the quality of attending feedback on resident performance. METHODS:Using a preexisting databank of 1925 manually reviewed feedback comments from 4 anesthesiology residency programs, we trained machine learning models to predict whether comments contained 6 predefined feedback traits (actionable, behavior focused, detailed, negative feedback, professionalism/communication, and specific) and predict the utility score of the comment on a scale of 1–5. Comments with ≥4 feedback traits were classified as high-quality and comments with ≥4 utility scores were classified as high-utility; otherwise comments were considered low-quality or low-utility, respectively. We used RapidMiner Studio (RapidMiner, Inc, Boston, MA), a data science platform, to train, validate, and score performance of models. RESULTS:Models for predicting the presence of feedback traits had accuracies of 74.4%–82.2%. Predictions on utility category were 82.1% accurate, with 89.2% sensitivity, and 89.8% class precision for low-utility predictions. Predictions on quality category were 78.5% accurate, with 86.1% sensitivity, and 85.0% class precision for low-quality predictions. Fifteen to 20 hours were spent by a research assistant with no prior experience in machine learning to become familiar with software, create models, and review performance on predictions made. The program read data, applied models, and generated predictions within minutes. In contrast, a recent manual feedback scoring effort by an author took 15 hours to manually collate and score 200 comments during the course of 2 weeks. CONCLUSIONS:Harnessing the potential of machine learning allows for rapid assessment of attending feedback on resident performance. Using predictive models to rapidly screen for low-quality and low-utility feedback can aid programs in improving feedback provision, both globally and by individual faculty.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0003-2999
1526-7598
DOI:10.1213/ANE.0000000000005265