Entity Recognition Algorithm for Structuring Literary Archives in Digital Humanities Education
Abstract
Entity recognition is a vital part of organising the archives of literature that allows scholars and students of digital humanities to reveal the patterns, relations, and contextual meanings of texts. Proper entity recognition aids the educational practices through enabling further processing of historical, cultural and literary data because the current methods tend to have issues related to ambiguities, overlapping entities and literary domain-dependent vocabulary. In order to overcome those problems, the current work offers a Transformer-Based Conditional Random Fields (CRF) framework, which integrates the contextual learning capabilities of transformers with the ability of CRFs to predict a sequence of actions. The transformer component yields rich semantic embeddings, and CRF does the labeling of complex text sequences in a consistent manner, thus generating structured metadata that is used in education. The results indicate that the model is much better in terms of accuracy, recall and adaptability than baseline methods, and is a dependable way of enriching literary archives in the classroom.