Biomedical Named Entity Recognition at Scale
Named entity recognition (NER) is a widely applicable natural language processing task and building block of question answering, topic modeling, information retrieval, etc. In the medical domain, NER plays a crucial role by extracting meaningful chunks from clinical notes and reports, which are then...
Saved in:
Main Authors: | , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
12-11-2020
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Named entity recognition (NER) is a widely applicable natural language
processing task and building block of question answering, topic modeling,
information retrieval, etc. In the medical domain, NER plays a crucial role by
extracting meaningful chunks from clinical notes and reports, which are then
fed to downstream tasks like assertion status detection, entity resolution,
relation extraction, and de-identification. Reimplementing a Bi-LSTM-CNN-Char
deep learning architecture on top of Apache Spark, we present a single
trainable NER model that obtains new state-of-the-art results on seven public
biomedical benchmarks without using heavy contextual embeddings like BERT. This
includes improving BC4CHEMD to 93.72% (4.1% gain), Species800 to 80.91% (4.6%
gain), and JNLPBA to 81.29% (5.2% gain). In addition, this model is freely
available within a production-grade code base as part of the open-source Spark
NLP library; can scale up for training and inference in any Spark cluster; has
GPU support and libraries for popular programming languages such as Python, R,
Scala and Java; and can be extended to support other human languages with no
code changes. |
---|---|
DOI: | 10.48550/arxiv.2011.06315 |