Статья

Multi-Agent Reinforcement Learning With Privacy Preservation for Continuous Double Auction-Based P2P Energy Trading

Jiehui ZhengSchool of Electric Power Engineering, South China University of Technology, Guangzhou, ChinaZe-Ting LiangSchool of Electric Power Engineering, South China University of Technology, Guangzhou, ChinaYuanzheng LiSchool of Artificial Intelligence and Automation, Key Laboratory of lmage Information Processing and Intelligent, Control of Ministry of Education of China, Huazhong University of Science and Technology, Wuhan, ChinaZhigang LiSchool of Electric Power Engineering, South China University of Technology, Guangzhou, ChinaQinghua WuSchool of Electric Power Engineering, South China University of Technology, Guangzhou, China

2024en

ABI

Аннотация

With increasing deployment of distributed energy resources, the energy market which aims for local generation and load profile redistribution is facing the challenge to accommodate various types of participants. To realize social welfare maximization with privacy preserving in a dynamic energy market, this article propose a multiagent reinforcement learning (MARL) method for quotation decision optimization in continuous double auction (CDA)-based peer-to-peer (P2P) energy market. To address the nonstationarity and privacy violation brought by multiagent context, we utilize mean-field approximation to abstract the unauthorized local information of other agents from the public market dynamics. An abstract Q-value function is developed for each agent to infer the neighbor agents' local observation and action through the public clearing results in the dynamic CDA market. Moreover, to avoid sparse reward so as to stabilize the learning process, we propose a dynamic potential-based reward shaping term in the reward. Without altering the learnt optimal policies, the agents can be informed with the additional energy storage state as the reward shaping in each time instants. To validate the effectiveness and economy of our proposed method, simulation studies are conducted on a real-world dataset. Simulation results show that the proposed MARL method produces up to 17% more convergent episodic reward and 67% less energy bills which indicates competitive convergence performance and significant economic benefits.

Перевод пока недоступен

Идентификаторы

DOI: 10.1109/tii.2023.3348823

Цитирования и источники

Цитирований: 9Использованных источников: 0

Показатели — AkademScholar