Leveraging Neural Machine Translation And Annotation Projection For Developing Multilingual Named Entity Recognition In Clinical NLP

Rama Krishna Raju Chekuri; Dr. RVVSV Prasad; A.Jayendra Sai; P.Prasanna Kumari; P.Venkata Raju

doi:10.62643/

Authors

Rama Krishna Raju Chekuri Author
Dr. RVVSV Prasad Author
A.Jayendra Sai Author
P.Prasanna Kumari Author
P.Venkata Raju Author

DOI:

https://doi.org/10.62643/

Keywords:

Neural Machine Translation, annotation projection, clinical NLP, Named Entity Recognition, multilingual datasets, Catalan, expert validation

Abstract

Developing clinical natural language processing (NLP) tools for multilingual use is challenging due to the scarcity of annotated datasets in many languages. This study presents an innovative approach to building Named Entity Recognition (NER) systems for under-resourced languages using Neural Machine Translation (NMT) and annotation projection. Spanish clinical texts, already annotated by domain experts, were translated into Catalan using a high- quality NMT system. The original annotations were then projected onto the Catalan translations.[1] To ensure data accuracy, clinical experts reviewed and corrected the projected annotations. This refined dataset was used to train a Catalan NER model, which achieved 90% accuracy on manually annotated test sets. The results demonstrate that this method can produce reliable clinical NLP resources in languages with limited training data. The proposed approach minimizes the need for extensive manual annotation and can be adapted to other languages, making it a scalable solution for multilingual clinical text processing. This work supports the development of more inclusive healthcare technologies by enabling NLP capabilities in diverse linguistic settings