D. P. Bertsekas and J. N. , Tsitsiklis, Neuro-Dynamic Programming (Optimization and Neural Computation Series, 3), Athena Scientific, 1996.

M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1994.
DOI : 10.1002/9780470316887

M. Geist and O. Pietquin, A Brief Survey of Parametric Value Function Approximation, Supélec, Tech. Rep, 2010.

D. Choi and B. Van-roy, A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning, Discrete Event Dynamic Systems, vol.22, issue.1,2,3, pp.207-239, 2006.
DOI : 10.1007/s10626-006-8134-8

J. N. Tsitsiklis and B. Van-roy, An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, vol.42, issue.5, 1997.
DOI : 10.1109/9.580874

G. Tesauro, Temporal difference learning and TD-Gammon, Communications of the ACM, vol.38, issue.3, 1995.
DOI : 10.1145/203330.203343

F. S. Melo, S. P. Meyn, and M. I. Ribeiro, An analysis of reinforcement learning with function approximation, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.664-671, 2009.
DOI : 10.1145/1390156.1390240

A. Antos, C. Szepesvári, and R. Munos, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, pp.89-129, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00830201

L. C. Baird, Residual Algorithms: Reinforcement Learning with Function Approximation, Proc. of the International Conference on Machine Learning (ICML 95), pp.30-37, 1995.
DOI : 10.1016/B978-1-55860-377-6.50013-X

Y. Engel, S. Mannor, and R. Meir, Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning, Proceedings of the International Conference on Machine Learning (ICML 03), pp.154-161, 2003.

C. E. Rassmussen and C. K. Williams, Gaussian Processes in Machine Learning, 2006.
DOI : 10.1162/089976602317250933

Y. Engel, S. Mannor, and R. Meir, The Kernel Recursive Least-Squares Algorithm, IEEE Transactions on Signal Processing, vol.52, issue.8, pp.2275-2285, 2004.
DOI : 10.1109/TSP.2004.830985

Y. Engel, Algorithms and Representations for Reinforcement Learning, 2005.

Z. Chen, Bayesian Filtering : From Kalman Filters to Particle Filters, and Beyond, 2003.

Y. Engel, S. Mannor, and R. Meir, Reinforcement learning with Gaussian processes, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005.
DOI : 10.1145/1102351.1102377
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.5939

D. Precup, R. S. Sutton, and S. P. Singh, Eligibility Traces for Off- Policy Policy Evaluation, Proceedings of the Seventeenth International Conference on Machine Learning (ICML 00, pp.759-766, 2000.

M. Geist, O. Pietquin, and G. Fricout, Kalman Temporal Differences: The deterministic case, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2009.
DOI : 10.1109/ADPRL.2009.4927543
URL : https://hal.archives-ouvertes.fr/hal-00380870

M. Geist and O. Pietquin, Kalman Temporal Differences, Journal of Artificial Intelligence Research, 2010.
DOI : 10.1109/adprl.2009.4927543
URL : https://hal.archives-ouvertes.fr/hal-00351297

C. W. Phua and R. Fitch, Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation, Proceedings of the 24th international conference on Machine learning, ICML '07, 2007.
DOI : 10.1145/1273496.1273591

M. Geist, O. Pietquin, and G. Fricout, Tracking in Reinforcement Learning, Proceedings of the 16th International Conference on Neural Information Processing, 2009.
DOI : 10.1007/978-3-642-10677-4_57
URL : https://hal.archives-ouvertes.fr/hal-00439316

R. E. Kalman, A New Approach to Linear Filtering and Prediction Problems, Journal of Basic Engineering, vol.82, issue.1, pp.35-45, 1960.
DOI : 10.1115/1.3662552

M. Geist and O. Pietquin, Statistically Linearized Recursive Least Squares Kittilä (Finland, Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, 2010.
DOI : 10.1109/mlsp.2010.5589236
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.207.7882

S. J. Bradtke and A. G. Barto, Linear Least-Squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996.

T. Söderström and P. Stoica, Instrumental variable methods for system identification, Circuits, Systems, and Signal Processing, pp.1-9, 2002.
DOI : 10.1007/BFb0009019

M. G. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

J. A. Boyan, Technical Update: Least-Squares Temporal Difference Learning, Machine Learning, pp.233-246, 1999.

H. Yu, Convergence of Least Squares Temporal Difference Methods Under General Conditions, International Conference on Machine Learning, pp.1207-1214, 2010.

M. Geist and O. Pietquin, Statistically linearized least-squares temporal differences, International Congress on Ultra Modern Telecommunications and Control Systems, 2010.
DOI : 10.1109/ICUMT.2010.5676598
URL : https://hal.archives-ouvertes.fr/hal-00554338

R. S. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver et al., Fast gradient-descent methods for temporal-difference learning with linear function approximation, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.993-1000, 2009.
DOI : 10.1145/1553374.1553501

S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee, Incremental natural actor-critic algorithms, Conference on Neural Information Processing Systems (NIPS), 2007.

H. Maei, C. Szepesvari, S. Bhatnagar, D. Precup, D. Silver et al., Convergent temporal-difference learning with arbitrary smooth function approximation, Advances in Neural Information Processing Systems 22, pp.1204-1212, 2009.

H. R. Maei and R. S. Sutton, GQ( ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces, Proceedings of the 3d Conference on Artificial General Intelligence (AGI-10), 2010.
DOI : 10.2991/agi.2010.22

H. R. Maei, C. Szepesvari, S. Bhatnagar, and R. S. Sutton, Toward Off-Policy Learning Control with Function Approximation, 27th conference on Machine Learning, 2010.

A. Samuel, Some studies in machine learning using the game of checkers, IBM Journal on Research and Development, pp.210-229, 1959.

G. Gordon, Stable Function Approximation in Dynamic Programming, Proceedings of the International Conference on Machine Learning (IMCL 95), 1995.
DOI : 10.1016/B978-1-55860-377-6.50040-2

R. Munos, Performance Bounds in Lp norm for Approximate Value Iteration, SIAM Journal on Control and Optimization, 2007.
DOI : 10.1137/040614384
URL : https://hal.archives-ouvertes.fr/inria-00124685

D. Ormoneit and S. Sen, Kernel-Based Reinforcement Learning, Machine Learning, pp.161-178, 2002.

M. Riedmiller, Neural Fitted Q Iteration ??? First Experiences with a Data Efficient Neural Reinforcement Learning Method, Europeac Conference on Machine Learning (ECML), 2005.
DOI : 10.1007/11564096_32

D. Ernst, P. Geurts, and L. Wehenkel, Tree-Based Batch Mode Reinforcement Learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2005.

A. Nedi´cnedi´c and D. P. Bertsekas, Least Squares Policy Evaluation Algorithms with Linear Function Approximation, Discrete Event Dynamic Systems: Theory and Applications, pp.79-110, 2003.

D. P. Bertsekas, V. Borkar, and A. Nedic, Learning and Approximate Dynamic Programming, ch. Improved Temporal Difference Methods with Linear Function Approximation, pp.231-235, 2004.

D. P. Bertsekas and H. Yu, Projected equation methods for approximate solution of large linear systems, Journal of Computational and Applied Mathematics, vol.227, issue.1, pp.27-50, 2007.
DOI : 10.1016/j.cam.2008.07.037

D. P. Bertsekas, Projected Equations, Variational Inequalities, and Temporal Difference Methods, IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2009.

H. Yu and D. P. Bertsekas, Q-Learning Algorithms for Optimal Stopping Based on Least Squares, Proceedings of European Control Conference, 2007.

X. Xu, D. Hu, and X. Lu, Kernel-Based Least Squares Policy Iteration for Reinforcement Learning, IEEE Transactions on Neural Networks, vol.18, issue.4, pp.973-992, 2007.
DOI : 10.1109/TNN.2007.899161
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.459.4489

T. Jung and D. Polani, Kernelizing LSPE(?), IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp.338-345, 2007.
DOI : 10.1109/adprl.2007.368208

A. Geramifard, M. Bowling, and R. S. Sutton, Incremental Least- Squares Temporal Difference Learning, 21st Conference of American Association for Artificial Intelligence (AAAI 06), pp.356-361, 2006.

A. Farahmand, M. Ghavamzadeh, C. Szepesvári, and S. Mannor, Regularized policy iteration, 22nd Annual Conference on Neural Information Processing Systems (NIPS 21), 2008.

J. Z. Kolter and A. Y. Ng, Regularization and feature selection in least-squares temporal difference learning, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553442