Перейти к основному содержанию
AkademIndex

Продукты

Для разработчиков

AkademBaseОткрытый API экосистемы
Статья

Increasing the Reliability of Information in Electronic Documents Based on the Use of Statistical Redundancy and n-Gram Structure of Texts

Khusan B. KarshievSamarkand State University Named After Sharof Rashidov,Department of "Artifical Intellegence and Information Systems",Samarkand,UzbekistanIsroil I. JumanovSamarkand State University Named After Sharof Rashidov,Department of "Artifical Intellegence and Information Systems",Samarkand,Uzbekistan
2025en
ABI

Аннотация

The problem is formulated and the methodology for creating a technology for increasing the reliability of information in electronic document management systems with mechanisms for extracting statistical, logical, semantic links, specific characteristics of elements and relationships of document concepts that determine the use of statistical information redundancy of the text based on models of multidimensional probability distributions of mono-, di-, tri- and n-grams is developed. Tools for increasing the reliability of information based on modifying the rules of statistical Huffman coding, forming n-grams of text statistics, and determining a rational set of hash functions are obtained. A methodology for studying the probability function of undetected errors with adaptable boundaries, variables, and intervals for checking the belonging of document elements to subsets of expanded and prohibited values is developed. A classifier of two alternatives with reliable and unreliable information is studied, the capabilities of which are expanded by a clustering mechanism with a one-sided model for viewing a text line from the left or right side. A software package for increasing the reliability of information has been developed and implemented in the C++ language, in which the proposed mechanisms operating in the CUDA parallel computing technology environment are synthesized. The software package is designed to detect and correct multiple errors in information.

Перевод пока недоступен

Темы

Идентификаторы

Цитирования и источники

Показатели — AkademScholar · Скоро