R. Bellman and S. Dreyfus, Functional approximation and dynamic programming, Mathematical Tables and Other Aids to Computation, pp.247-251, 1959.

M. Geist and O. Pietquin, Parametric value function approximation: A unified view, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp.9-16, 2011.
DOI : 10.1109/ADPRL.2011.5967355

URL : https://hal.archives-ouvertes.fr/hal-00618112

M. G. Lagoudakis and R. Parr, Least-squares policy iteration, J. Mach. Learn. Res, vol.4, pp.1107-1149, 2003.

S. J. Bradtke and A. G. Barto, Linear Least-Squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996.

D. Ernst, P. Geurts, and L. Wehenkel, Tree-based batch mode reinforcement learning, J. Mach. Learn. Res, vol.6, pp.503-556, 2005.

G. Gordon, Stable Function Approximation in Dynamic Programming, Proceedings of the International Conference on Machine Learning (ICML), 1995.
DOI : 10.1016/B978-1-55860-377-6.50040-2

A. Nouri and M. Littman, Dimension reduction and its application to model-based exploration in continuous spaces, Machine Learning, pp.85-98, 2010.
DOI : 10.1007/s10994-010-5202-y

S. Vijayakumar, A. , and S. Schaal, Incremental Online Learning in High Dimensions, Neural Computation, vol.11, issue.4, pp.2602-2634, 2005.
DOI : 10.1162/089976602753284491

H. Wold, Multivariate Analysis Estimation of principal components and related models by iterative least squares, pp.391-420, 1966.

S. Schaal and C. G. Atkeson, Receptive Field Weighted Regression, ATR Human Information Processing Laboratories, 1997.

M. Geist, O. Pietquin, and G. Fricout, Bayesian Reward Filtering, Recent Advances in Reinforcement Learning, S. Girgin and colleagues, pp.96-109, 2008.
DOI : 10.1007/978-3-540-89722-4_8

URL : https://hal.archives-ouvertes.fr/hal-00351282