Мақола

Data Driven Analysis of Semantic and Phraseo Semantic Fields Integrating Information Retrieval Systems for Cross Linguistic Comparative Studies

Abduraxim NasirovRectorate, Uzbekistan State University of World Languages, Tashkent, UzbekistanOybek EshbayevDigital Economy Department, Tashkent State University of Economics, Tashkent, Uzbekistan

2024en

ABI

Аннотация

This study was an attempt to analyze semantic and phraseo-semantic field structures with the application of semantic clustering, regression modeling, and network visualization analysis and to project a data-driven comparative method applicable to identify cross-linguistic semantic equivalences and cultural nuances. Using a multilingual corpus-based approach with natural language processing (NLP) knowledge in this study, an integrated hierarchical semantic model was computed and classified into three distinct levels of semantic alignment. During the process, a set of phraseological units were selected (here, Uzbek, Russian, and French textual corpora) together with the degree of cultural proximity in sustainability of linguistic equivalence. Integrated contextual embedding techniques and network analysis models were applied to generate the necessary factors (semantic equivalence and cultural alignment) maps for cross-linguistic comparative approach. The TF-IDF method, along with Gephi network visualization, was used to identify and map the factors, and on the other hand, regression modeling was applied for calculating alignment discrepancies. The entire process was executed in the integrated Gephi and Python-based NLP software tool that supports data-driven linguistic analysis. The results showed that the overall semantic alignment of the area was at a moderate level and was partly determined by cultural nuances. Areas under specific phraseo-semantic clusters of Uzbek and Russian corpora indicated that the phraseological alignment status showed significant disparities in these parts of the corpora. The study also revealed that cultural context and phraseological uniqueness have serious influence on the semantic alignment and equivalence of the fields analyzed. Considering linguistic and cultural diversity characteristics of the analyzed corpora, the study area was further segmented into four priority semantic clusters which may serve as base references of semantic alignment studies for the development of cross-linguistic comparative frameworks.

Мавзулар

Topic Modeling Natural Language Processing Techniques Advanced Text Analysis Techniques

Идентификаторлар

DOI: 10.1145/3726122.3726241

Иқтибослар ва манбалар

10 та иқтибос 21 та фойдаланилган манба

Кўрсаткичлар — AkademScholar · Тез орада