Tracking in Reinforcement Learning

Abstract : Reinforcement learning induces non-stationarity at several levels. Adaptation to non-stationary environments is of course a desired feature of a fair RL algorithm. Yet, even if the environment of the learning agent can be considered as stationary, generalized policy iteration frameworks, because of the interleaving of learning and control, will produce non-stationarity of the evaluated policy and so of its value function. Tracking the optimal solution instead of trying to converge to it is therefore preferable. In this paper, we propose to handle this tracking issue with a Kalman-based temporal difference framework. Complexity and convergence analysis are studied. Empirical investigations of its ability to handle non-stationarity is finally provided.
Document type :
Conference papers
Complete list of metadatas

https://hal-supelec.archives-ouvertes.fr/hal-00439316
Contributor : Sébastien van Luchene <>
Submitted on : Monday, December 7, 2009 - 12:04:28 PM
Last modification on : Wednesday, February 13, 2019 - 5:20:08 PM

Links full text

Identifiers

Collections

Citation

Matthieu Geist, Olivier Pietquin, Gabriel Fricout. Tracking in Reinforcement Learning. 16th International Conference on Neural Information Processing - ICONIP 2009, Dec 2009, Bangkok, Thailand. pp.502-511, ⟨10.1007/978-3-642-10677-4_57⟩. ⟨hal-00439316⟩

Share

Metrics

Record views

211