Building an Open-Source HA Cluster: Proxmox VE + CEPH For “Five Nines” Service Continuity
Abstract
This paper presents the design, deployment, and evaluation of a cost-efficient, open-source high-availability (HA) cluster for corporate IT services using Proxmox VE and Ceph. Motivated by national digital-transformation priorities, we implement a three-node Proxmox VE cluster with distributed Ceph storage, network segmentation for management/ replication/ client traffic, and HA orchestration via Proxmox HA-manager. On this platform we provision mission-critical services – file sharing, a replicated database tier, web hosting, and mail – and assess failure behavior under node and service fault scenarios. Measurements show automatic service recovery from physical-node failure in ∼120 s and VM-level recovery from process/guest failures in ∼10 s, meaning effective annual availability of up to "five nines" with elimination of single points of failure at compute, storage, and network levels. Reliability risks are explained in more detail, mitigation is described, and a rapid cost/benefit view emphasizing the viability of open technologies to resource-constrained organizations is given. Results indicate the proposed architecture is reproducible, scalable, and suitable for medium-to-large enterprises seeking HA without proprietary licensing overhead.