Article

Multi-Agent Reinforcement Learning for Optimizing Cloud Resource Scheduling in Hybrid Cloud Environments

С RajanDepartment of Computer Applications, K S Rangasamy College of Technology, Tamil Nadu, IndiaSaravanakumar VeerappanDirectorate, Centivens Institute of Innovative Research, Tamil Nadu, IndiaMoti Ranjan TandiDepartment of Computer Science, Kalinga University, Chhattisgarh, IndiaP BalamuruganDepartment of Networking and Communications, SRM Institute of Science and Technology, Tamil Nadu, IndiaSanjar GoyipnazarovFaculty of Economics, Tashkent State University of Economics, Tashkent, Uzbekistan

2025

ABI

Abstract

The growing complexity and scale of machine learning (ML) operations in the cloud are among the key problems with traditional resource allocation and parallel processing mechanisms. Traditional cloud systems struggle to meet the increasing demands of dynamic ML workloads, particularly concerning throughput, latency, and energy efficiency. This paper bridges this gap by proposing a Multi-Agent Reinforcement Learning (MARL) framework to optimize Cloud Resource Scheduling in Hybrid Cloud Environments (comprising private, public, and edge resources). It models the problem as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP), outlining how decentralized agents can coordinate resource allocation and workload distribution to maximize resource utilization and minimize latency and cost. The MARL approach enables adaptive, distributed decision-making for dynamic workload offloading, significantly enhancing the performance across heterogeneous classical resources. Empirical research, validated via Cloud Sim and Edge Cloud Sim, demonstrates that the proposed MARL framework achieves superior performance, showing marked improvements in average response time, resource utilization, and energy efficiency compared to traditional heuristic methods and single-agent RL models. The framework offers an efficient and scalable solution for accelerating ML tasks and provides a path towards intelligent, dynamic cloud resource orchestration.

Topics

IoT and Edge/Fog Computing Cloud Computing and Resource Management Software-Defined Networks and 5G

Identifiers

DOI: 10.1145/3789692.3789792

Citations and references

Cited by 010 references

Metrics — AkademScholar