Balancing Value Iteration and Policy Iteration for Discrete-Time Control

  • Biao Luo*
  • , Yin Yang
  • , Huai Ning Wu
  • , Tingwen Huang
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

88 Citations (Scopus)

Abstract

The optimal control problem of discrete-time nonlinear systems depends on the solution of the Bellman equation. In this paper, an adaptive reinforcement learning (RL) method is developed to solve the complex Bellman equation, which balances value iteration (VI) and policy iteration (PI). By adding a balance parameter, an adaptive RL integrates VI and PI together, which accelerates VI and avoids the need of an initial admissible control. The convergence of the adaptive RL is proved by showing that it converges to the Bellman equation. Subsequently, the adaptive RL is realized by using the neural network (NN) approximation for value function and a least-squares scheme is developed for updating NN weights. Then, the convergence of NN-based adaptive RL is proved with considering NN approximation error. To further improve its performance, an adaptive rule is developed for tuning balance parameter in adaptive RL iteration by iteration. Finally, the effectiveness of the adaptive RL is validated with simulation studies.

Original languageEnglish
Article number8657988
Pages (from-to)3948-3958
Number of pages11
JournalIEEE Transactions on Systems, Man, and Cybernetics: Systems
Volume50
Issue number11
DOIs
Publication statusPublished - Nov 2020

Keywords

  • Adaptive dynamic programming
  • Bellman equation
  • discrete-time
  • neural network (NN)
  • optimal control

Fingerprint

Dive into the research topics of 'Balancing Value Iteration and Policy Iteration for Discrete-Time Control'. Together they form a unique fingerprint.

Cite this