July 12, 2018
Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control
International Conference on Machine Learning (ICML)
In this paper, we study an instance of TS in the challenging setting of the infinite-horizon linear quadratic (LQ) control, which models problems with continuous state-action variables, linear dynamics, and quadratic cost.
By: Marc Abeille, Alessandro Lazaric
Facebook AI Research