TY - GEN
T1 - Training Efficiency of DDQN-Based Multilevel Inverter Control
T2 - 2nd Symposium on Smart, Sustainable, and Secure Internet of Things, S4IoT 2025
AU - Alquennah, Alamera Nouran
AU - Hamed, Sara
AU - Zamzam, Tassneem
AU - Abu-Rub, Haitham
AU - Trabelsi, Mohamed
AU - Bayhan, Sertac
AU - Ghrayeb, Ali
AU - Khatri, Sunil
N1 - Publisher Copyright:
© The Author(s) 2026.
PY - 2026
Y1 - 2026
N2 - Reinforcement Learning (RL)-based controllers have recently gained attention as AI-driven, model-free methods for controlling power electronic converters by learning optimal control actions through continuous interaction with the environment. Their learning process is governed by a reward function, which guides the agent’s behavior. This paper investigates the influence of incorporating penalty terms into the reward function on the training efficiency and performance of an RL-based controller for a 7-level grid-tied Packed-U-Cell (PUC7) multilevel inverter. The controller is developed using the Double Deep Q-Network (DDQN) algorithm, selected for its balanced combination of strong performance and ease of implementation. The control objectives include sinusoidal current injection into the grid and capacitor voltage regulation around the desired value. The reward function is designed based on current and voltage tracking errors, with two penalty terms introduced to limit deviations beyond predefined thresholds. The study evaluates the impact of varying these penalty magnitudes on learning speed, convergence behavior, and tracking quality. Simulations are conducted in MATLAB/Simulink, demonstrating that the appropriate selection and application of penalties improve training efficiency without compromising control performance.
AB - Reinforcement Learning (RL)-based controllers have recently gained attention as AI-driven, model-free methods for controlling power electronic converters by learning optimal control actions through continuous interaction with the environment. Their learning process is governed by a reward function, which guides the agent’s behavior. This paper investigates the influence of incorporating penalty terms into the reward function on the training efficiency and performance of an RL-based controller for a 7-level grid-tied Packed-U-Cell (PUC7) multilevel inverter. The controller is developed using the Double Deep Q-Network (DDQN) algorithm, selected for its balanced combination of strong performance and ease of implementation. The control objectives include sinusoidal current injection into the grid and capacitor voltage regulation around the desired value. The reward function is designed based on current and voltage tracking errors, with two penalty terms introduced to limit deviations beyond predefined thresholds. The study evaluates the impact of varying these penalty magnitudes on learning speed, convergence behavior, and tracking quality. Simulations are conducted in MATLAB/Simulink, demonstrating that the appropriate selection and application of penalties improve training efficiency without compromising control performance.
KW - Reinforcement learning
KW - Reward function
UR - https://www.scopus.com/pages/publications/105029528174
U2 - 10.1007/978-981-95-5136-1_15
DO - 10.1007/978-981-95-5136-1_15
M3 - Conference contribution
AN - SCOPUS:105029528174
SN - 9789819551354
T3 - Lecture Notes in Electrical Engineering
SP - 151
EP - 162
BT - Proceedings of the 2nd Symposium on Smart, Sustainable, and Secure Internet of Things - Proceedings of S4IoT 2025
A2 - Trabelsi, Mohamed
A2 - Bouida, Zied
A2 - Murugappan, M.
A2 - Khan, Murad
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 6 May 2025 through 7 May 2025
ER -