TY - GEN
T1 - Scalable Reinforcement Learning for Linear-Quadratic Control of Networks
AU - Olsson, Johan
AU - Zhang, Runyu(Cathy)
AU - Tegling, Emma
AU - Li, Na
PY - 2024
Y1 - 2024
N2 - Distributed optimal control is known to be challenging and can become intractable even for linear-quadratic regulator problems. In this work, we study a special class of such problems where distributed state feedback controllers can give near-optimal performance. More specifically, we consider networked linear-quadratic controllers with decoupled costs and spatially exponentially decaying dynamics. We aim to exploit the structure in the problem to design a scalable reinforcement learning algorithm for learning a distributed controller. Recent work has shown that the optimal controller can be well approximated only using information from a kq -neighborhood of each agent. Motivated by these results, we show that similar results hold for the agents' individual value and Q-functions. We continue by designing an algorithm, based on the actor-critic framework, to learn distributed controllers only using local information. Specifically, the Q-function is estimated by modifying the Least Squares Temporal Difference for Q-functions method to only use local information. The algorithm then updates the policy using gradient descent. Finally, we evaluate the algorithm through simulations that indeed suggest near-optimal performance.
AB - Distributed optimal control is known to be challenging and can become intractable even for linear-quadratic regulator problems. In this work, we study a special class of such problems where distributed state feedback controllers can give near-optimal performance. More specifically, we consider networked linear-quadratic controllers with decoupled costs and spatially exponentially decaying dynamics. We aim to exploit the structure in the problem to design a scalable reinforcement learning algorithm for learning a distributed controller. Recent work has shown that the optimal controller can be well approximated only using information from a kq -neighborhood of each agent. Motivated by these results, we show that similar results hold for the agents' individual value and Q-functions. We continue by designing an algorithm, based on the actor-critic framework, to learn distributed controllers only using local information. Specifically, the Q-function is estimated by modifying the Least Squares Temporal Difference for Q-functions method to only use local information. The algorithm then updates the policy using gradient descent. Finally, we evaluate the algorithm through simulations that indeed suggest near-optimal performance.
UR - https://www.scopus.com/pages/publications/85204427546
U2 - 10.23919/ACC60939.2024.10644413
DO - 10.23919/ACC60939.2024.10644413
M3 - Paper in conference proceeding
AN - SCOPUS:85204427546
T3 - Proceedings of the American Control Conference
SP - 1813
EP - 1818
BT - Proceedings of the American Control Conference
PB - IEEE - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 American Control Conference, ACC 2024
Y2 - 10 July 2024 through 12 July 2024
ER -