Skip to Main content Skip to Navigation
Journal articles

From Supervised to Reinforcement Learning: a Kernel-based Bayesian Filtering Framework

Abstract : In a large number of applications, engineers have to estimate a function linked to the state of a dynamic system. To do so, a sequence of samples drawn from this unknown function is observed while the system is transiting from state to state and the problem is to generalize these observations to unvisited states. Several solutions can be envisioned among which regressing a family of parameterized functions so as to make it fit at best to the observed samples. This is the first problem addressed with the proposed kernel-based Bayesian filtering approach, which also allows quantifying uncertainty reduction occurring when acquiring more samples. Classical methods cannot handle the case where actual samples are not directly observable but only a non linear mapping of them is available, which happens when a special sensor has to be used or when solving the Bellman equation in order to control the system. However the approach proposed in this paper can be extended to this tricky case. Moreover, an application of this indirect function approximation scheme to reinforcement learning is presented. A set of experiments is also proposed in order to demonstrate the efficiency of this kernel-based Bayesian approach.
Document type :
Journal articles
Complete list of metadata

Cited literature [23 references]  Display  Hide  Download
Contributor : Sébastien van Luchene Connect in order to contact the contributor
Submitted on : Friday, December 4, 2009 - 11:45:51 AM
Last modification on : Monday, December 14, 2020 - 2:10:02 PM
Long-term archiving on: : Thursday, June 17, 2010 - 7:27:08 PM


Publisher files allowed on an open archive


  • HAL Id : hal-00429891, version 1



Matthieu Geist, Olivier Pietquin, Gabriel Fricout. From Supervised to Reinforcement Learning: a Kernel-based Bayesian Filtering Framework. International Journal On Advances in Software, IARIA, 2009, 2 (1), pp.101-116. ⟨hal-00429891⟩



Les métriques sont temporairement indisponibles