D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996.

R. Schoknecht, Optimality of Reinforcement Learning Algorithms with Linear Function Approximation, Conference on Neural Information Processing Systems (NIPS 15), 2002.

S. J. Bradtke and A. G. Barto, Linear Least-Squares Algorithms for Temporal Difference Learning, Machine Learning, pp.33-57, 1996.

L. C. Iii, Residual Algorithms: Reinforcement Learning with Function Approximation, International Conference on Machine Learning (ICML 95), pp.30-37, 1995.

J. A. Boyan, Technical Update: Least-Squares Temporal Difference Learning, Machine Learning, pp.233-246, 1999.

R. E. Kalman, A New Approach to Linear Filtering and Prediction Problems, Journal of Basic Engineering, vol.82, issue.1, pp.35-45, 1960.
DOI : 10.1115/1.3662552

Y. Engel, Algorithms and Representations for Reinforcement Learning, 2005.

D. Choi and B. V. Roy, A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning, Discrete Event Dynamic Systems, vol.22, issue.1,2,3, pp.207-239, 2006.
DOI : 10.1007/s10626-006-8134-8

C. W. Phua and R. Fitch, Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation, Proceedings of the 24th international conference on Machine learning, ICML '07, 2007.
DOI : 10.1145/1273496.1273591

S. J. Julier and J. K. Uhlmann, Unscented Filtering and Nonlinear Estimation, Proceedings of the IEEE, pp.401-422, 2004.
DOI : 10.1109/JPROC.2003.823141

M. Geist, O. Pietquin, and G. Fricout, Bayesian Reward Filtering, Proceedings of the European Workshop on Reinforcement Learning ser. Lecture Notes in Artificial Intelligence, pp.96-109, 2008.
DOI : 10.1007/978-3-540-89722-4_8

URL : https://hal.archives-ouvertes.fr/hal-00351282

D. Simon, Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches, 2006.
DOI : 10.1002/0470045345

C. M. Bishop, Neural Networks for Pattern Recognition, 1995.

M. Geist, O. Pietquin, and G. Fricout, A Sparse Nonlinear Bayesian Online Kernel Regression, 2008 The Second International Conference on Advanced Engineering Computing and Applications in Sciences, pp.199-204, 2008.
DOI : 10.1109/ADVCOMP.2008.7

URL : https://hal.archives-ouvertes.fr/hal-00327081

R. Van-der-merwe, Sigma-Point Kalman Filters for Probabilistic Inference in Dynamic State-Space Models, 2004.

A. Antos, C. Szepesvári, and R. Munos, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, pp.89-129, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00830201