Normalization of Disorder Mentions in the Biomedical Domain


Normalization is the task of mapping a word or a phrase in a document to a unique concept in an ontology (based on the description of that concept in the ontology) after disambiguating potential ambiguous surface words, or phrases. This task has been variously called entity disambiguation, record linkage, or entity linking.

Our sieve-based approach for the normalization of disorder mentions in biomedical data is described in the following paper.

Jennifer D’Souza and Vincent Ng. July 2015. Sieve-Based Entity Linking in the Biomedical Domain. In Proceedings of ACL-IJCNLP.

Our adopted approach is simple yet effective for disorder mention normalization. Each sieve in our normalization system corresponds to a unique syntactic string transformation heuristic responsible for converting a mention into one of its equivalent forms in order to match with its stored form in the ontology. A pictorial depiction of the working of our sieve-based system with only a subset of the sieves in our full system is provided below.

Normalization of Names in Clinical Notes
Figure 1 - Normalization of disorder mentions in clinical data at different sieve levels.
Normalization of Names in Biomedical Abstracts
Figure 2 - Normalization of disorder mentions in biomedical abstracts at different sieve levels.
Figures 1 and 2 - Sieve-based system organized as tiers of mention normalization modules that produce variations of a mention string via morphological or syntactic transformations, one at a time, from highest to lowest precision of specificity of the string. (Preliminary system: Jennifer D'Souza. January 2015. A Multi-Pass Sieve for Name Normalization. In Proceedings of AAAI--Student Abstract and Poster Program.)


Download our sieve-based disorder normalization tool.

GitHub: Here is our disorder normalization tool's GitHub site.

Funding Statement

This work was supported in part by NSF Grants IIS-1147644 and IIS-1219142. Any opinions, findings, or conclusions expressed above are those of the authors and do not necessarily reflect the views or official policies of NSF.


Questions, feedback, and suggestions for improvement are welcome via email contact.