Перейти к основному содержанию
AkademIndex

Продукты

Для разработчиков

AkademBaseскороОткрытый API экосистемы
Латиница
Статья

Preserving Historical Documents Using OCR and Natural Language Processing (NLP)

Mirzokhid AskarovTashkent State University of Oriental Studies,The Department of History of the People of Central Asia,UzbekistanAlisher GafforovSamarkand State University,Samarkand,UzbekistanAdolat DarmonovaChirchik State Pedagogical UniversityMokhlaroyim DadakhonovaAndijan State Institute of Foreign Languages,UzbekistanTokhirjon IsmailovUgiljon QushnazarovaUrgench State University,Department of Pedagogy and Psychology,Urgench City,Uzbekistan
2025en
ABI

Аннотация

Preserving historical documents is essential for safeguarding cultural heritage and making historical knowledge accessible to future generations. However, traditional digitization methods often fail to capture and process degraded or handwritten texts effectively, limiting searchability and usability. Existing Optical Character Recognition (OCR) techniques struggle with inaccuracies due to variations in handwriting styles, faded ink, and document deterioration, making it difficult to convert these texts into usable digital formats. This study proposes an OCR-NLP framework that combines Optical Character Recognition with Natural Language Processing techniques to address these limitations. OCR extracts text from historical manuscripts, while NLP enhances accuracy through contextual analysis, entity recognition, and language modeling. This hybrid approach improves text recognition quality, even for complex or degraded documents. The proposed method enables the creation of searchable digital archives, making historical manuscripts more accessible for researchers, historians, and the general public. By integrating machine learning-based text correction and semantic indexing, the system enhances the reliability of digital archives. Findings show that the OCR-NLP approach significantly improves text extraction accuracy and usability, ensuring better preservation and accessibility of historical records. This advancement fosters digital heritage preservation by transforming fragile manuscripts into structured, searchable, and readable formats.

Темы

Идентификаторы

Цитирования и источники

Показатели — AkademScholar · Скоро