Kalman Temporal Differences: the deterministic case

Abstract : This paper deals with value function and $Q$-function approximation in deterministic Markovian decision processes. A general statistical framework based on the Kalman filtering paradigm is introduced. Its principle is to adopt a parametric representation of the value function, to model the associated parameter vector as a random variable and to minimize the mean-squared error of the parameters conditioned on past observed transitions. From this general framework, which will be called Kalman Temporal Differences (KTD), and using an approximation scheme called the unscented transform, a family of algorithms is derived, namely KTD-V, KTD-SARSA and KTD-Q, which aim respectively at estimating the value function of a given policy, the $Q$-function of a given policy and the optimal $Q$-function. The proposed approach holds for linear and nonlinear parameterization. This framework is discussed and potential advantages and shortcomings are highlighted.
Document type :
Conference papers
ADPRL 2009, Mar 2009, Nashville, TN, United States. pp.185-192, 2009, <10.1109/ADPRL.2009.4927543>
Liste complète des métadonnées


https://hal-supelec.archives-ouvertes.fr/hal-00380870
Contributor : Sébastien Van Luchene <>
Submitted on : Wednesday, May 6, 2009 - 9:44:31 AM
Last modification on : Tuesday, December 8, 2009 - 9:55:44 AM
Document(s) archivé(s) le : Thursday, June 10, 2010 - 10:41:48 PM

File

Supelec471.pdf
Publisher files allowed on an open archive

Identifiers

Collections

Citation

Matthieu Geist, Olivier Pietquin, Gabriel Fricout. Kalman Temporal Differences: the deterministic case. ADPRL 2009, Mar 2009, Nashville, TN, United States. pp.185-192, 2009, <10.1109/ADPRL.2009.4927543>. <hal-00380870>

Share

Metrics

Record views

137

Document downloads

213