Northeastern University, Boston.
International Journal of Science and Research Archive, 2025, 16(03), 082-089
Article DOI: 10.30574/ijsra.2025.16.3.2499
Received on 20 July 2025; revised on 26 August 2025; accepted on 30 August 2025
Named entity recognition of digitized medical records accessed through Optical Character Recognition (OCR) poses significant problems, since the character distortions, uneven formatting, and domain-specific acronyms render it quite difficult. Such artifacts worsen the quality of rule-based or machine learning models that do not perform well under such noisy conditions by retaining consistent entity extraction. The use of hybrid techniques, i.e., the combination of deterministic rule-based modules with neural convolutional models like transformer-based models, is a stable remedy to these problems. Hybrid systems show better tolerance to OCR-induced noise by combining lexicon-based rules with contextual embeddings and error correction mechanisms, and ensemble strategies to maximize precision and achieve higher recall in clinical entity extraction (diagnoses, medications, and time-related entities). The piece is an analysis of the issues related to processing OCR-generated medical text, the implementation and development of hybrid NER pipelines, their institution-agnostic scalability, and research directions, such as multimodal learning, self-supervised pretraining on noisy data, and the orchestration of large-scale healthcare systems with the help of AI.
OCR Medical Records; Named Entity Recognition; Hybrid NLP Approaches; Clinical Text Processing; Noise-Robust Models
Preview Article PDF
FNU Sudhakar Abhijeet. Hybrid Approaches for NER in Noisy OCR Medical Records. International Journal of Science and Research Archive, 2025, 16(03), 082-089. Article DOI: https://doi.org/10.30574/ijsra.2025.16.3.2499.
Copyright © 2025 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution Liscense 4.0







