TY - GEN
T1 - Lyapunov-Based Reward Function Design for Reinforcement Learning Control of Grid-Connected Multilevel Inverters
T2 - 2025 IEEE 4th Industrial Electronics Society Annual On-Line Conference, ONCON 2025
AU - Alquennah, Alamera Nouran
AU - Kouzou, Ahmed
AU - Zamzam, Tassneem
AU - Bayhan, Sertac
AU - Trabelsi, Mohamed
AU - Abu-Rub, Haitham
AU - Ghrayeb, Ali
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - This paper proposes a Lyapunov-based reward function design for Reinforcement Learning Control (RL-C) of a grid-connected 5-level Packed U Cell (PUC5) inverter. Unlike existing RL approaches that rely solely on tracking-error-based rewards, the proposed method integrates the Lyapunov stability condition directly into the reward formulation, ensuring stable learning and robust closed-loop performance. The control objectives are to regulate the flying capacitor voltage and inject a low total harmonic distortion (THD) grid current. The Proximal Policy Optimization (PPO) algorithm is employed and trained in a MATLAB/Simulink environment under randomized operating conditions to enhance generalization. Simulation results demonstrate that the Lyapunov-based RL-C achieves stable operation with THD as low as 1.9 % and capacitor voltage error below 1 V across different current levels. Moreover, the trained agent exhibits strong adaptability to parameter variations and untrained operating points, confirming the proposed framework's robustness and suitability for real-time power electronics applications.
AB - This paper proposes a Lyapunov-based reward function design for Reinforcement Learning Control (RL-C) of a grid-connected 5-level Packed U Cell (PUC5) inverter. Unlike existing RL approaches that rely solely on tracking-error-based rewards, the proposed method integrates the Lyapunov stability condition directly into the reward formulation, ensuring stable learning and robust closed-loop performance. The control objectives are to regulate the flying capacitor voltage and inject a low total harmonic distortion (THD) grid current. The Proximal Policy Optimization (PPO) algorithm is employed and trained in a MATLAB/Simulink environment under randomized operating conditions to enhance generalization. Simulation results demonstrate that the Lyapunov-based RL-C achieves stable operation with THD as low as 1.9 % and capacitor voltage error below 1 V across different current levels. Moreover, the trained agent exhibits strong adaptability to parameter variations and untrained operating points, confirming the proposed framework's robustness and suitability for real-time power electronics applications.
KW - Lyapunov stability
KW - Multilevel inverter
KW - Reinforcement learning
UR - https://www.scopus.com/pages/publications/105035611953
U2 - 10.1109/ONCON68412.2025.11384260
DO - 10.1109/ONCON68412.2025.11384260
M3 - Conference contribution
AN - SCOPUS:105035611953
T3 - 2025 IEEE 4th Industrial Electronics Society Annual On-Line Conference, ONCON 2025
BT - 2025 IEEE 4th Industrial Electronics Society Annual On-Line Conference, ONCON 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 11 December 2025 through 13 December 2025
ER -