COMPARATIVE ANALYSIS AND IMPLEMENTATION OF MULTI-LEVEL NORMALIZATION TECHNIQUES IN NETWORK DATABASES
Abstract
Modern network databases integrate data from heterogeneous sources, which differ significantly in format, scale, encoding, and semantic structure. The challenge of normalizing such data across multiple levels — record, field, and value — remains a critical step in data integration pipelines. This paper presents a comparative analysis and practical implementation of multi-level normalization techniques across twelve distinct data types encountered in network databases: numeric/decimal, integer, text, categorical, temporal, boolean, binary/file, spatial, monetary, array/set, JSON/XML, and null values. For each data type, we systematically compare candidate normalization methods — including min-max scaling, Z-score standardization, one-hot encoding, label encoding, Unicode normalization, ISO 8601 formatting, and log-scaling — identifying the conditions under which each approach is most appropriate. We highlight the trade-offs between methods in terms of distribution preservation, semantic consistency, and computational cost. A Python-based implementation is evaluated on a synthetic dataset of 1,000 records, demonstrating that applying type-aware multi-level normalization reduces numerical attribute dispersion by approximately 70% and achieves 100% structural consistency in text and semi-structured fields. The proposed framework supports downstream tasks including ontological mapping, semantic integration, and 3D data visualization.