Can natural language processing help differentiate inflammatory intestinal diseases in China? Models applying random forest and convolutional neural network approaches

Differentiating between ulcerative colitis (UC), Crohn's disease (CD) and intestinal tuberculosis (ITB) using endoscopy is challenging. We aimed to realize automatic differential diagnosis among these diseases through machine learning algorithms. A total of 6399 consecutive patients (5128 UC, 8...

Full description

Saved in:
Bibliographic Details
Published in:BMC medical informatics and decision making Vol. 20; no. 1; p. 248
Main Authors: Tong, Yuanren, Lu, Keming, Yang, Yingyun, Li, Ji, Lin, Yucong, Wu, Dong, Yang, Aiming, Li, Yue, Yu, Sheng, Qian, Jiaming
Format: Journal Article
Language:English
Published: England BioMed Central Ltd 29-09-2020
BioMed Central
BMC
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Differentiating between ulcerative colitis (UC), Crohn's disease (CD) and intestinal tuberculosis (ITB) using endoscopy is challenging. We aimed to realize automatic differential diagnosis among these diseases through machine learning algorithms. A total of 6399 consecutive patients (5128 UC, 875 CD and 396 ITB) who had undergone colonoscopy examinations in the Peking Union Medical College Hospital from January 2008 to November 2018 were enrolled. The input was the description of the endoscopic image in the form of free text. Word segmentation and key word filtering were conducted as data preprocessing. Random forest (RF) and convolutional neural network (CNN) approaches were applied to different disease entities. Three two-class classifiers (UC and CD, UC and ITB, and CD and ITB) and a three-class classifier (UC, CD and ITB) were built. The classifiers built in this research performed well, and the CNN had better performance in general. The RF sensitivities/specificities of UC-CD, UC-ITB, and CD-ITB were 0.89/0.84, 0.83/0.82, and 0.72/0.77, respectively, while the values for the CNN of CD-ITB were 0.90/0.77. The precisions/recalls of UC-CD-ITB when employing RF were 0.97/0.97, 0.65/0.53, and 0.68/0.76, respectively, and when employing the CNN were 0.99/0.97, 0.87/0.83, and 0.52/0.81, respectively. Classifiers built by RF and CNN approaches had excellent performance when classifying UC with CD or ITB. For the differentiation of CD and ITB, high specificity and sensitivity were achieved as well. Artificial intelligence through machine learning is very promising in helping unexperienced endoscopists differentiate inflammatory intestinal diseases. The abstract of this article has won the first prize of the Young Investigator Award during the Asian Pacific Digestive Week (APDW) 2019 held in Kolkata, India.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1472-6947
1472-6947
DOI:10.1186/s12911-020-01277-w