A. Antos, C. Szepesvári, and R. Munos, Learning near-optimal policies with Bellmanresidual minimization based fitted policy iteration and a single sample path, Machine Learning, Différences Temporelles de Kalman : le cas stochastique, pp.89-129, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00830201

L. C. Baird, Residual Algorithms: Reinforcement Learning with Function Approximation, Proceedings of the International Conference on Machine Learning (ICML 95), pp.30-37, 1995.
DOI : 10.1016/B978-1-55860-377-6.50013-X

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, 1996.
DOI : 10.1007/0-306-48332-7_333

S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee, Incremental Natural Actor-Critic Algorithms, Proceedings of NIPS 21, 2008.
DOI : 10.1016/j.automatica.2009.07.008

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

J. A. Boyan, Technical Update : Least-Squares Temporal Difference Learning, Machine Learning, pp.233-246, 1999.

S. J. Bradtke and A. G. Barto, Linear Least-Squares Algorithms for Temporal Difference Learning, Machine Learning, pp.33-57, 1996.

Y. Engel, Algorithms and Representations for Reinforcement Learning, 2005.

Y. Engel, S. Mannor, and R. Meir, Reinforcement learning with Gaussian processes, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005.
DOI : 10.1145/1102351.1102377

M. Geist, O. Pietquin, and G. Fricout, Différences Temporelles de Kalman, 2009.
DOI : 10.3166/ria.24.423-443

M. Geist, O. Pietquin, and G. Fricout, Kalman Temporal Differences: The deterministic case, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2009.
DOI : 10.1109/ADPRL.2009.4927543

URL : https://hal.archives-ouvertes.fr/hal-00380870

S. J. Julier and J. K. Uhlmann, Unscented Filtering and Nonlinear Estimation, Proceedings of the IEEE, pp.401-422, 2004.
DOI : 10.1109/JPROC.2003.823141

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

R. E. Kalman, A New Approach to Linear Filtering and Prediction Problems, Journal of Basic Engineering, vol.82, issue.1, pp.35-45, 1960.
DOI : 10.1115/1.3662552

D. Precup, R. S. Sutton, and S. P. Singh, Eligibility Traces for Off-Policy Policy Evaluation, Proceedings of the Seventeenth International Conference on Machine Learning (ICML00), pp.759-766, 2000.

D. Simon, Optimal State Estimation : Kalman, H Infinity, and Nonlinear Approaches, 2006.
DOI : 10.1002/0470045345

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

R. Van-der-merwe, Sigma-Point Kalman Filters for Probabilistic Inference in Dynamic State-Space Models, 2004.

C. J. Watkins, Learning from Delayed Rewards, 1989.