Abstract
In this article, the data-based two-player zero-sum game problem is considered for linear discrete-time systems. This problem theoretically depends on solving the discrete-time game algebraic Riccati equation (DTGARE), while it requires complete system dynamics. To avoid solving the DTGARE, the Q-function is introduced and a data-based policy iteration Q-learning (PIQL) algorithm is developed to learn the optimal Q-function by using data collected from the real system. Writing the Q-function in a quadratic form, it is proved that the PIQL algorithm is equivalent to the Newton iteration method in the Banach space by using the Fréchet derivative. Then, the convergence of the PIQL algorithm can be guaranteed by Kantorovich's theorem. For the realization of the PIQL algorithm, the off-policy learning scheme is proposed using real data rather than the system model. Finally, the efficiency of the developed data-based PIQL method is validated through simulation studies.
| Original language | English |
|---|---|
| Article number | 9005399 |
| Pages (from-to) | 3630-3640 |
| Number of pages | 11 |
| Journal | IEEE Transactions on Cybernetics |
| Volume | 51 |
| Issue number | 7 |
| DOIs | |
| Publication status | Published - Jul 2021 |
Keywords
- Adaptive dynamic programming (ADP)
- Q-learning
- discrete-time systems
- policy iteration
- two-player zero-sum game