Article

Increasing the Reliability of Information in Electronic Documents Based on the Use of Statistical Redundancy and n-Gram Structure of Texts

Khusan B. KarshievSamarkand State University Named After Sharof Rashidov,Department of "Artifical Intellegence and Information Systems",Samarkand,UzbekistanIsroil I. JumanovSamarkand State University Named After Sharof Rashidov,Department of "Artifical Intellegence and Information Systems",Samarkand,Uzbekistan

2025en

ABI

Abstract

The problem is formulated and the methodology for creating a technology for increasing the reliability of information in electronic document management systems with mechanisms for extracting statistical, logical, semantic links, specific characteristics of elements and relationships of document concepts that determine the use of statistical information redundancy of the text based on models of multidimensional probability distributions of mono-, di-, tri- and n-grams is developed. Tools for increasing the reliability of information based on modifying the rules of statistical Huffman coding, forming n-grams of text statistics, and determining a rational set of hash functions are obtained. A methodology for studying the probability function of undetected errors with adaptable boundaries, variables, and intervals for checking the belonging of document elements to subsets of expanded and prohibited values is developed. A classifier of two alternatives with reliable and unreliable information is studied, the capabilities of which are expanded by a clustering mechanism with a one-sided model for viewing a text line from the left or right side. A software package for increasing the reliability of information has been developed and implemented in the C++ language, in which the proposed mechanisms operating in the CUDA parallel computing technology environment are synthesized. The software package is designed to detect and correct multiple errors in information.

Topics

Information Systems and Technology Applications Library Science and Information Scientific Research and Philosophical Inquiry

Identifiers

DOI: 10.1109/rusautocon65989.2025.11177396

Citations and references

Cited by 014 references

Metrics — AkademScholar