Hybrid Big Data Analytics: Integrating Structured and Unstructured Data for Predictive Intelligence
Аннотация
Hybrid big data analytics has emerged as a compelling paradigm for predictive intelligence, yet most operational pipelines still privilege a single modality—either structured relational data or unstructured text—thereby under-exploiting complementary signals. This paper proposes a unified framework that integrates structured records (e.g., time-series sensors, tabular attributes) with unstructured corpora (e.g., clinical narratives, web-scale text) through a multi-modal deep learning architecture coupled with scalable clustering and query optimization. The method fuses static encoders, temporal CNN/LSTM modules, and text representations (e.g., document embeddings with BiLSTM/CNN) in a learned fusion layer, and augments inference with a Gaussian Mixture Model optimized by a bio-inspired Salp Swarm Algorithm for low-latency, distributed querying. Experiments across two representative domains—infectious-disease forecasting and Industry 4.0 cycle-time projection—demonstrate consistent gains over single-modality baselines in AUROC, F1, MAE, and AUPRC, while preserving near real-time responsiveness on commodity GPU/CPU clusters. We discuss integration complexity, interpretability challenges, and deployment constraints, and delineate practical pathways for edge-side execution, transfer learning across domains, and explainability overlays. By systematically bridging structured and unstructured modalities, the study evidences material performance improvements and offers a robust template for multimodal analytics in high-stakes environments.
Перевод пока недоступен