Article

Text Classification Algorithm Using BERT for Legal Document Analysis and Summarization

Adilbek DauletovKhurshida BakhrievaGanixodjayeva DilfuzaAnvar AbduraxmanovM ShanmughapriyaZahraa EisaAl-Mustaqbal University College,Intelligent Medical System Department,Hilla,IraqH. AudaUniversity of Hilla,College of Engineering,Medical Device Department,Babylon,Iraq,51011Murtadha YehyaUniversity of Al-Ameed Karbala,College of Medicine,Iraq

2025

ABI

Abstract

Legal documents' length, complexity, and plenty of domain-specific language make classification and analysis of them challenging. Often lacking the required contextual depth, conventional natural language processing (NLP) techniques cannot properly classify and summarize. This work presents a text classification method leveraging Bidirectional Encoder Representations from Transformers (BERT) to summarize and evaluate legal documents. The method consists of refining a pre-trained BERT model using a custom-annotated legal dataset comprising several legal documents, including statutes, contracts, and case law. Tokenization and text-normalizing techniques were applied in preprocessing to enhance the model's input. Evaluated using precision, recall, and F1-score measures, the model exceeded conventional classifiers with an F1-score of 91.3%. Using BERT-based extractive summarizing helped to preserve the semantic integrity of the long legal texts by combining them into succinct and significant summaries. Applied to legal research, the approach effectively identified relevant material and reduced manual effort. The results show that BERT significantly raises the legal domain's summaries and classification quality. At last, the automated legal document processing of the proposed BERT-based framework improves legal practitioners' capacity for decision-making and information retrieval.

Identifiers

DOI: 10.1109/iccr67387.2025.11292309

Citations and references

Cited by 20 references

Metrics — AkademScholar