Parametric value function approximation: A unified view

Abstract : Reinforcement learning (RL) is a machine learning answer to the optimal control problem. It consists of learning an optimal control policy through interactions with the system to be controlled, the quality of this policy being quantified by the so-called value function. An important RL subtopic is to approximate this function when the system is too large for an exact representation. This survey reviews and unifies state of the art methods for parametric value function approximation by grouping them into three main categories: bootstrapping, residuals and projected fixed-point approaches. Related algorithms are derived by considering one of the associated cost functions and a specific way to minimize it, almost always a stochastic gradient descent or a recursive least-squares approach.
Document type :
Conference papers
Complete list of metadatas

Cited literature [49 references]  Display  Hide  Download

https://hal-supelec.archives-ouvertes.fr/hal-00618112
Contributor : Sébastien van Luchene <>
Submitted on : Wednesday, August 31, 2011 - 4:47:05 PM
Last modification on : Wednesday, July 31, 2019 - 4:18:02 PM
Long-term archiving on : Thursday, December 1, 2011 - 2:30:56 AM

File

ADPRL_2011_MGOP.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Matthieu Geist, Olivier Pietquin. Parametric value function approximation: A unified view. ADPRL 2011, Apr 2011, Paris, France. pp.9-16, ⟨10.1109/ADPRL.2011.5967355⟩. ⟨hal-00618112⟩

Share

Metrics

Record views

202

Files downloads

395