Kalman Temporal Differences

Matthieu Geist; Olivier Pietquin

Article Dans Une Revue Journal of Artificial Intelligence Research Année : 2010

Kalman Temporal Differences

(1) , (1)

Matthieu Geist

Fonction : Auteur
PersonId : 6945
IdHAL : matthieu-geist

IMS : Information, Multimodalité & Signal

Olivier Pietquin

Fonction : Auteur
PersonId : 4024
IdHAL : olivier-pietquin
ORCID : 0000-0002-5386-465X
IdRef : 142821861

IMS : Information, Multimodalité & Signal

Résumé

Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits the following features: sample-efficiency, non-linear approximation, non-stationarity handling and uncertainty management. A first KTD-based algorithm is provided for deterministic Markov Decision Processes (MDP) which produces biased estimates in the case of stochastic transitions. Than the eXtended KTD framework (XKTD), solving stochastic MDP, is described. Convergence is analyzed for special cases for both deterministic and stochastic transitions. Related algorithms are experimented on classical benchmarks. They compare favorably to the state of the art while exhibiting the announced features.

Domaines

Machine Learning [stat.ML]

Sébastien Van Luchene : Connectez-vous pour contacter le contributeur

https://centralesupelec.hal.science/hal-00858687

Soumis le : jeudi 5 septembre 2013-17:22:17

Dernière modification le : mardi 14 février 2023-03:36:37

Dates et versions

hal-00858687 , version 1 (05-09-2013)

Identifiants

HAL Id : hal-00858687 , version 1

Citer

Matthieu Geist, Olivier Pietquin. Kalman Temporal Differences. Journal of Artificial Intelligence Research, 2010, 39, pp.483-532. ⟨hal-00858687⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

SUPELEC CENTRALESUPELEC

59 Consultations

0 Téléchargements

Kalman Temporal Differences

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager