Skip to main content
Article

A Hybrid Ensemble Framework for Rare Event Detection in Large-Scale Tabular Data

Natalya MaxutovaDepartment of Information Systems, L. N. Gumilyov Eurasian National University, Astana 010000, KazakhstanАkmaral KassymovaDepartment of Information Technology, Zhangir Khan University, Uralsk 010009, KazakhstanKuanysh KadirkulovDepartment of Information Systems, S. Seifullin Kazakh Agrotechnical Research University, Astana 010000, KazakhstanAisulu IsmailovaDepartment of Information Systems, S. Seifullin Kazakh Agrotechnical Research University, Astana 010000, KazakhstanGulkiz ZhidekulovaDepartment of Cybersecurity and Cryptology, Al-Farabi Kazakh National University, Almaty 010002, KazakhstanZhanar AzhibekovaDepartment of Information and communication technologies, S. Asfendiyarov Kazakh National Medical University, Almaty 010002, KazakhstanJamalbek TussupovDepartment of Information Systems, L. N. Gumilyov Eurasian National University, Astana 010000, KazakhstanQuvvatali RakhimovDepartment of Applied Mathematics and Informatics, Fergana State University, Fergana 150100, UzbekistanZhanat KenzhebayevaDepartment of Computer Science, Caspian University of Technology and Engineering, Sh. Yessenov, Aktau 130000, Kazakhstan
Computersjournal2026en
ABI

Abstract

Rare event detection in large tabular data remains a computationally challenging problem due to class imbalance, heterogeneous feature distributions, and unstable thresholds. Traditional machine learning approaches based on individual models and fixed thresholds often exhibit limited robustness and reproducibility in such settings. This paper proposes a hybrid ensemble framework for rare event detection that integrates heterogeneous machine learning models through threshold-aware probabilistic aggregation. The framework combines gradient-boosted decision trees, regularized linear models, and neural networks, leveraging their complementary inductive biases. To ensure reproducibility and robust performance evaluation under severe class imbalance, a leaky-controlled evaluation protocol is employed, including rootwise summation, probability calibration, and validation-based threshold optimization. The proposed approach is evaluated on a large tabular dataset containing approximately 50,000 observations. Experimental results demonstrate improved rare event detection and robust generalization performance compared to individual baseline models. Explainability is achieved through Shapley Additive Explanations (SHAP)-based attribution analysis and clustering in the explanation space, enabling transparent analysis of ensemble decision-making behavior. The proposed framework represents a general-purpose computational solution for rare event detection and can be applied to a wide range of data-driven decision-making and anomaly detection problems.

Topics

Identifiers

Citations and references

Cited by 031 references
Metrics — AkademScholar · Coming soon