Statistically linearized least-squares temporal differences

Abstract : A common drawback of standard reinforcement learning algorithms is their inability to scale-up to real-world problems. For this reason, a current important trend of research is (state-action) value function approximation. A prominent value function approximator is the least-squares temporal differences (LSTD) algorithm. However, for technical reasons, linearity is mandatory: the parameterization of the value function must be linear (compact nonlinear representations are not allowed) and only the Bellman evaluation operator can be considered (imposing policy-iteration-like schemes). In this paper, this restriction of LSTD is lifted thanks to a derivative-free statistical linearization approach. This way, nonlinear parameterizations and the Bellman optimality operator can be taken into account (this last point allows taking into account value-iteration-like schemes). The efficiency of the resulting algorithms are demonstrated using a linear parametrization and neural networks as well as on a Q-learning-like problem. A theoretical analysis is also provided.
Document type :
Conference papers
Complete list of metadatas

https://hal-supelec.archives-ouvertes.fr/hal-00553913
Contributor : Sébastien van Luchene <>
Submitted on : Monday, January 10, 2011 - 11:13:53 AM
Last modification on : Thursday, March 29, 2018 - 11:06:04 AM

Identifiers

Collections

Citation

Matthieu Geist, Olivier Pietquin. Statistically linearized least-squares temporal differences. ICUMT 2010, Oct 2010, Moscow, Russia. pp.450-457, ⟨10.1109/ICUMT.2010.5676598⟩. ⟨hal-00553913⟩

Share

Metrics

Record views

69