A. Antos, C. Szepesvári, and R. Munos, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, pp.89-129, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00830201

L. C. Baird, Residual Algorithms: Reinforcement Learning with Function Approximation, Proceedings of the International Conference on Machine Learning (ICML 95), pp.30-37, 1995.
DOI : 10.1016/B978-1-55860-377-6.50013-X

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, 1996.
DOI : 10.1007/0-306-48332-7_333

S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee, Incremental Natural Actor-Critic Algorithms, Proceedings of the Twenty-First Annual Conference on Advances in Neural Information Processing Systems (NIPS), 2008.
DOI : 10.1016/j.automatica.2009.07.008

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

C. M. Bishop, Neural Networks for Pattern Recognition, 1995.

J. A. Boyan, Technical Update : Least-Squares Temporal Difference Learning, Machine Learning, pp.233-246, 1999.

S. J. Bradtke and A. G. Barto, Linear Least-Squares Algorithms for Temporal Difference Learning, Machine Learning, pp.33-57, 1996.

D. Choi and B. V. Roy, A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning. Discrete Event Dynamic Systems, pp.207-239, 2006.

R. Dearden, N. Friedman, and S. J. Russell, Bayesian q-learning, AAAI/IAAI, pp.761-768, 1998.

Y. Engel, Algorithms and Representations for Reinforcement Learning, 2005.

Y. Engel, S. Mannor, and R. Meir, Reinforcement learning with Gaussian processes, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005.
DOI : 10.1145/1102351.1102377

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

M. Geist, O. Pietquin, and G. Fricout, Bayesian Reward Filtering, Proceedings of the European Workshop on Reinforcement Learning, pp.96-109, 2008.
DOI : 10.1007/978-3-540-89722-4_8

URL : https://hal.archives-ouvertes.fr/hal-00351282

M. Geist, O. Pietquin, and G. Fricout, Kalman Temporal Differences : Uncertainty and Value Function Approximation, NIPS Workshop on Model Uncertainty and Risk in Reinforcement Learning, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00351298

M. Geist, O. Pietquin, and G. Fricout, Différences temporelles de kalman : le cas stochastique, actes des Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes, 2009.

M. Geist, O. Pietquin, and G. Fricout, Kalman Temporal Differences: The deterministic case, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2009.
DOI : 10.1109/ADPRL.2009.4927543

URL : https://hal.archives-ouvertes.fr/hal-00380870

J. S. Kim and S. W. , Consistent Normalized Least Mean Square Filtering with Noisy Data Matrix, IEEE Transactions on Signal Processing, vol.53, issue.6, pp.2112-2123, 2005.

S. J. Julier and J. K. Uhlmann, Unscented Filtering and Nonlinear Estimation, Proceedings of the IEEE, pp.401-422, 2004.
DOI : 10.1109/JPROC.2003.823141

R. E. Kalman, A New Approach to Linear Filtering and Prediction Problems, Journal of Basic Engineering, vol.82, issue.1, pp.35-45, 1960.
DOI : 10.1115/1.3662552

M. G. Lagoudakis and R. Parr, Least-Squares Policy Iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

C. W. Phua and R. Fitch, Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation, Proceedings of the 24th international conference on Machine learning, ICML '07, 2007.
DOI : 10.1145/1273496.1273591

R. Schoknecht, Optimality of Reinforcement Learning Algorithms with Linear Function Approximation, Conference on Neural Information Processing Systems (NIPS 15), 2002.

O. Sigaud and O. Buffet, Processus décisionnels de Markov en intelligence artificielle, 2008.

D. Simon, Optimal State Estimation : Kalman, H Infinity, and Nonlinear Approaches, 2006.
DOI : 10.1002/0470045345

A. L. Strehl, L. Li, E. Wiewiora, J. Langford, and M. L. Littman, PAC model-free reinforcement learning, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.881-888, 2006.
DOI : 10.1145/1143844.1143955

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

T. Söderström and P. Stoica, Instrumental variable methods for system identification, Circuits, Systems, and Signal Processing, vol.57, pp.1-9, 2002.
DOI : 10.1007/BFb0009019

R. Van-der-merwe and E. Wan, Sigma-Point Kalman Filters for Probabilistic Inference in Dynamic State-Space Models, Proceedings of the Workshop on Advances in Machine Learning, 2003.