Developing A Semantic Search-Based Chatbot For Legal Queries In Uzbek Language Using Big Data Resources
Аннотация
Abstract: In this paper, we suggest a unique model aimed at responding to the law-related inquiries in the Uzbek language and using semantic search methods on a large data warehouse of legal documents. The model faces the issues of low-resource languages such as Uzbek where natural language processing (NLP) systems are scarce by enabling the use of multilingual embeddings, retrieval-augmented generation (RAG), and an original hybrid algorithm named Uzbek-Specific Hybrid Semantic Retrieval Algorithm (USHRA). The Lex.uz database, the national portal of legal information of the Uzbekistan, is the main big data source where we find the law and the decrees, as well as court decisions. The chatbot prototype is found to have an accuracy of 85 percent in answering user questions on a test of 200 criminal queries. This article serves to add to available AI-based legal services in Central Asia, where the semantic understanding is better than the traditional keywords search. The model illustrates the tendency of increased legal technologies use in the new markets according to the Uzbekistan digital economy strategy and country roadmap of AI implementation. Added features in future versions might be real time updates and multilingual features.