Leveraging Neural Machine Translation And Annotation Projection For Developing Multilingual Named Entity Recognition In Clinical NLP
DOI:
https://doi.org/10.62643/Keywords:
Neural Machine Translation, annotation projection, clinical NLP, Named Entity Recognition, multilingual datasets, Catalan, expert validationAbstract
Developing clinical natural language processing (NLP) tools for multilingual use is challenging due to the scarcity of annotated datasets in many languages. This study presents an innovative approach to building Named Entity Recognition (NER) systems for under-resourced languages using Neural Machine Translation (NMT) and annotation projection. Spanish clinical texts, already annotated by domain experts, were translated into Catalan using a high- quality NMT system. The original annotations were then projected onto the Catalan translations.[1] To ensure data accuracy, clinical experts reviewed and corrected the projected annotations. This refined dataset was used to train a Catalan NER model, which achieved 90% accuracy on manually annotated test sets. The results demonstrate that this method can produce reliable clinical NLP resources in languages with limited training data. The proposed approach minimizes the need for extensive manual annotation and can be adapted to other languages, making it a scalable solution for multilingual clinical text processing. This work supports the development of more inclusive healthcare technologies by enabling NLP capabilities in diverse linguistic settings
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.