Kalman Temporal Differences

Matthieu Geist 1 Olivier Pietquin 2
2 IMS - Equipe Information, Multimodalité et Signal
UMI2958 - Georgia Tech - CNRS [Metz], SUPELEC-Campus Metz
Abstract : Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits the following features: sample-efficiency, non-linear approximation, non-stationarity handling and uncertainty management. A first KTD-based algorithm is provided for deterministic Markov Decision Processes (MDP) which produces biased estimates in the case of stochastic transitions. Than the eXtended KTD framework (XKTD), solving stochastic MDP, is described. Convergence is analyzed for special cases for both deterministic and stochastic transitions. Related algorithms are experimented on classical benchmarks. They compare favorably to the state of the art while exhibiting the announced features.
Document type :
Journal articles
Complete list of metadatas

https://hal-supelec.archives-ouvertes.fr/hal-00858687
Contributor : Sébastien van Luchene <>
Submitted on : Thursday, September 5, 2013 - 5:22:17 PM
Last modification on : Wednesday, July 31, 2019 - 4:18:03 PM

Identifiers

  • HAL Id : hal-00858687, version 1

Collections

Citation

Matthieu Geist, Olivier Pietquin. Kalman Temporal Differences. Journal of Artificial Intelligence Research, Association for the Advancement of Artificial Intelligence, 2010, 39, pp.483-532. ⟨hal-00858687⟩

Share

Metrics

Record views

120