Computer aided classification of diagnostic terms in spanish

•Framework to classify Medical Records by their diagnostic terms.•Resources: ICD catalogue, SNOMED ontology, Medical Records and synonym dictionaries.•Finite-State Transducers efficiently implement soft-matching operations.•An F1-measure of 91.2 was achieved on a test-set of 2850 diagnostic terms. T...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications Vol. 42; no. 6; pp. 2949 - 2958
Main Authors: Pérez, Alicia, Gojenola, Koldo, Casillas, Arantza, Oronoz, Maite, Díaz de Ilarraza, Arantza
Format: Journal Article
Language:English
Published: Elsevier Ltd 15-04-2015
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Framework to classify Medical Records by their diagnostic terms.•Resources: ICD catalogue, SNOMED ontology, Medical Records and synonym dictionaries.•Finite-State Transducers efficiently implement soft-matching operations.•An F1-measure of 91.2 was achieved on a test-set of 2850 diagnostic terms. The goal of this paper is to classify Medical Records (MRs) by their diagnostic terms (DTs) according to the International Classification of Diseases Clinical Modification (ICD-9-CM). The challenge we face is twofold: (i) to treat the natural and non-standard language in which doctors express their diagnostics and (ii) to perform a large-scale classification problem. We propose the use of Finite-State Transducers (FSTs) that, for their underlying topology, constrain the allowed input DT string while synchronously produce the output ICD-9-CM class. It is outstanding their versatility to efficiently implement soft-matching operations between terms expressed in natural language to standard terms and, hence, to the final ICD-9-CM code. The FSTs were built up from a corpora and standard resources such as the ICD-9-CM and SNOMED CT amongst others. Our corpora count on a big-data comprising more than 20,000 DTs from MRs from the Basque Hospital System so as to model natural language in this domain. An F1-measure of 91.2 was achieved on a test-set of 2850 randomly selected DTs, and a random 5-fold cross validation on a training set served to double-check the stability of the provided results. Real MRs were of much help to adapt the system to natural language. Misspellings, colloquial and specific language and abbreviations made the classification process difficult. The FSTs were proven efficient in this large-scale classification task. Moreover, the composition operation for FSTs made it easy the addition of new features to the system.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2014.11.035