A Unified Big Data Analytics Framework using AutoML and Deep Learning for Real-time Business Intelligence
Аннотация
This paper presents an artefact-centric analytics framework that reconciles predictive utility, low-latency inference, and auditable traceability for real-time business intelligence (BI). Modern BI systems increasingly require low-latency, auditable predictive analytics but suffer from gaps between offline model development and production serving—caused by feature-parity breaks, schema drift, tail latency, and weak KPI → exemplar traceability. Our design couples a parity-preserving feature-fabric (materialised Delta views) with a constrained AutoML with multi-fidelity search and a compilation/distillation pipeline producing registry-tracked ONNX/TVM artefacts. A gated serving policy and a distilled fast-path reconcile ensemble-quality decisions with median/tail-latency budgets, while provenance capture and schema/version governance enable KPI → exemplar traceability and auditable rollbacks. Evaluation on a production-like BI workload demonstrates that the AutoML-selected ensemble achieves F1 = 0.768, while the distilled fast-path recovers F1=0.754 and meets median latency targets by limiting ensemble invocations to ⩽10%. Ablation studies show multi-fidelity evaluation reduces search cost with modest utility loss, and drift-injection experiments show automated warm-start retraining restores KPIs within a single retraining cycle (∼3–4 hours) when thresholds are exceeded. Contributions include an artifact-centric pipeline enforcing offline/online parity and traceability; a constrained AutoML plus compilation workflow that meets deployment budgets; and an operational governance stack validating automated recovery across diverse BI domains.