Historical Text Analysis Using Transformer Models for Language Decoding and Translation
Abstract
Ancient text deciphering and translation are essential to know about past civilizations, but standard linguistic approaches do not work because contextual details are missing and language patterns change over time. Breakthroughs in transformer models present potential solutions for language decoding and translation. Current approaches are based primarily on rule-based or statistical methods, which perform poorly with incomplete texts, low-frequency linguistic patterns, and insufficient annotated datasets, resulting in poor-quality translations. Further, most available deep learning models are non-transferable to less familiar ancient scripts. To offset these constraints, we propose the Decode and Translate Ancient Languages using BERT (DTAL-BERT) approach, which leverages BERT-based transformers in contextual language modeling. DTAL-BERT fuses self-attention mechanisms and masked language modeling to reconstruct missing segments, improving translation precision and contextual perception.Moreover, it applies transfer learning from recent language corpora to enhance recognition of ancient text. The DTALBERT model enables the decoding and semantic analysis of old languages more specifically and efficiently, aiding linguists and historians in better interpreting text. Experimental validations confirm that DTAL-BERT translates more efficiently and accurately and outperforms the contextualization and robustness of conventional models in reconstructing fragmented texts. The framework contributes significantly to preserving and comprehending ancient languages through AI-based methods.