Article

Corpus-based Uncertainty Analysis of Multilingual Media under Language Policy

Sulieman Ibraheem Shelash Al-HawaryElectronic Marketing and Social Media, Economic and Administrative Sciences, Zarqa University, Zarqa 13110, Jordan Faculty of Business and Communications, INTI International University, Nilai 71800, MalaysiaYogeesh NijalingappaDepartment of Mathematics, Government First Grade College, Tumkur 572101, IndiaHanan JadallahElectronic Marketing and Social Media, Economic and Administrative Sciences, Zarqa University, Zarqa 13110, JordanNaveed Iqbal RajaDepartment of Visual Communication, Sathyabama Institute of Science and Technology, Chennai 600119, IndiaAzizbek QaraqulovDepartment of Uzbek Language and Literature, Termez University of Economics and Service, Termez 190111, UzbekistanAsokan VasudevanFaculty of Business and Communications, INTI International University, Nilai 71800, Malaysia Faculty of Management, Shinawatra University, Sam Khok 12160,Thailand Department of Business Stusies, Wekerle Business School, 1083 Budapest, HungarySadoqat MasharipovaDepartment of Roman-Germanic Philology, Mamun University, Khiva 220900, Uzbekistan

Forum for Linguistic Studiesjournal2025

ABI

Abstract

This paper presents a mathematical framework for quantifying graded language mixing in media texts surrounding a policy reform. We model each document as generated by probabilistic n-gram models for two languages, interpret the resulting posterior probabilities as soft-membership degrees, and apply Shannon entropy to measure per-document mixing. A fuzzification exponent controls assignment sharpness, and aggregate entropy across documents yields a corpus-level metric tracked over pre- and post-reform intervals. In a case study of 20 headlines, mean entropy rose from 0.52 to 0.68 nats (∆ = 0.16), indicating increased code-mixing after the policy change. Statistical validation via a paired t-test (t = 3.27, p < 0.01) and a permutation test (p = 0.005) confirms the significance of this shift. Analysis of soft-membership distributions reveals a drop in average English membership from 0.77 to 0.52, further illustrating editorial adaptation. The modular implementation enables scalable analysis of large corpora, and an open-source toolkit is provided to promote reproducibility and extension to other bilingual or multilingual settings. We discuss limitations related to parameter sensitivity, model assumptions, and sample size, and outline future extensions involving imprecise-probability bounds, contextual embeddings, dynamic time-series modeling, and topic-augmented uncertainty. Our results demonstrate the power of information-theoretic tools for detecting subtle shifts in media discourse in response to regulatory changes.

Topics

Computational and Text Analysis Methods Text Readability and Simplification Authorship Attribution and Profiling

Identifiers

DOI: 10.30564/fls.v7i12.11494

Citations and references

Cited by 00 references

Metrics — AkademScholar · Coming soon