Tsitsiklis, Neuro-Dynamic Programming (Optimization and Neural Computation Series, 3), Athena Scientific, 1996. ,
Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1994. ,
DOI : 10.1002/9780470316887
A Brief Survey of Parametric Value Function Approximation, Supélec, Tech. Rep, 2010. ,
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning, Discrete Event Dynamic Systems, vol.22, issue.1,2,3, pp.207-239, 2006. ,
DOI : 10.1007/s10626-006-8134-8
An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, vol.42, issue.5, 1997. ,
DOI : 10.1109/9.580874
Temporal difference learning and TD-Gammon, Communications of the ACM, vol.38, issue.3, 1995. ,
DOI : 10.1145/203330.203343
An analysis of reinforcement learning with function approximation, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.664-671, 2009. ,
DOI : 10.1145/1390156.1390240
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, pp.89-129, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00830201
Residual Algorithms: Reinforcement Learning with Function Approximation, Proc. of the International Conference on Machine Learning (ICML 95), pp.30-37, 1995. ,
DOI : 10.1016/B978-1-55860-377-6.50013-X
Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning, Proceedings of the International Conference on Machine Learning (ICML 03), pp.154-161, 2003. ,
Gaussian Processes in Machine Learning, 2006. ,
DOI : 10.1162/089976602317250933
The Kernel Recursive Least-Squares Algorithm, IEEE Transactions on Signal Processing, vol.52, issue.8, pp.2275-2285, 2004. ,
DOI : 10.1109/TSP.2004.830985
Algorithms and Representations for Reinforcement Learning, 2005. ,
Bayesian Filtering : From Kalman Filters to Particle Filters, and Beyond, 2003. ,
Reinforcement learning with Gaussian processes, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005. ,
DOI : 10.1145/1102351.1102377
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.5939
Eligibility Traces for Off- Policy Policy Evaluation, Proceedings of the Seventeenth International Conference on Machine Learning (ICML 00, pp.759-766, 2000. ,
Kalman Temporal Differences: The deterministic case, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2009. ,
DOI : 10.1109/ADPRL.2009.4927543
URL : https://hal.archives-ouvertes.fr/hal-00380870
Kalman Temporal Differences, Journal of Artificial Intelligence Research, 2010. ,
DOI : 10.1109/adprl.2009.4927543
URL : https://hal.archives-ouvertes.fr/hal-00351297
Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation, Proceedings of the 24th international conference on Machine learning, ICML '07, 2007. ,
DOI : 10.1145/1273496.1273591
Tracking in Reinforcement Learning, Proceedings of the 16th International Conference on Neural Information Processing, 2009. ,
DOI : 10.1007/978-3-642-10677-4_57
URL : https://hal.archives-ouvertes.fr/hal-00439316
A New Approach to Linear Filtering and Prediction Problems, Journal of Basic Engineering, vol.82, issue.1, pp.35-45, 1960. ,
DOI : 10.1115/1.3662552
Statistically Linearized Recursive Least Squares Kittilä (Finland, Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, 2010. ,
DOI : 10.1109/mlsp.2010.5589236
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.207.7882
Linear Least-Squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996. ,
Instrumental variable methods for system identification, Circuits, Systems, and Signal Processing, pp.1-9, 2002. ,
DOI : 10.1007/BFb0009019
Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003. ,
Technical Update: Least-Squares Temporal Difference Learning, Machine Learning, pp.233-246, 1999. ,
Convergence of Least Squares Temporal Difference Methods Under General Conditions, International Conference on Machine Learning, pp.1207-1214, 2010. ,
Statistically linearized least-squares temporal differences, International Congress on Ultra Modern Telecommunications and Control Systems, 2010. ,
DOI : 10.1109/ICUMT.2010.5676598
URL : https://hal.archives-ouvertes.fr/hal-00554338
Fast gradient-descent methods for temporal-difference learning with linear function approximation, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.993-1000, 2009. ,
DOI : 10.1145/1553374.1553501
Incremental natural actor-critic algorithms, Conference on Neural Information Processing Systems (NIPS), 2007. ,
Convergent temporal-difference learning with arbitrary smooth function approximation, Advances in Neural Information Processing Systems 22, pp.1204-1212, 2009. ,
GQ( ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces, Proceedings of the 3d Conference on Artificial General Intelligence (AGI-10), 2010. ,
DOI : 10.2991/agi.2010.22
Toward Off-Policy Learning Control with Function Approximation, 27th conference on Machine Learning, 2010. ,
Some studies in machine learning using the game of checkers, IBM Journal on Research and Development, pp.210-229, 1959. ,
Stable Function Approximation in Dynamic Programming, Proceedings of the International Conference on Machine Learning (IMCL 95), 1995. ,
DOI : 10.1016/B978-1-55860-377-6.50040-2
Performance Bounds in Lp norm for Approximate Value Iteration, SIAM Journal on Control and Optimization, 2007. ,
DOI : 10.1137/040614384
URL : https://hal.archives-ouvertes.fr/inria-00124685
Kernel-Based Reinforcement Learning, Machine Learning, pp.161-178, 2002. ,
Neural Fitted Q Iteration ??? First Experiences with a Data Efficient Neural Reinforcement Learning Method, Europeac Conference on Machine Learning (ECML), 2005. ,
DOI : 10.1007/11564096_32
Tree-Based Batch Mode Reinforcement Learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2005. ,
Least Squares Policy Evaluation Algorithms with Linear Function Approximation, Discrete Event Dynamic Systems: Theory and Applications, pp.79-110, 2003. ,
Learning and Approximate Dynamic Programming, ch. Improved Temporal Difference Methods with Linear Function Approximation, pp.231-235, 2004. ,
Projected equation methods for approximate solution of large linear systems, Journal of Computational and Applied Mathematics, vol.227, issue.1, pp.27-50, 2007. ,
DOI : 10.1016/j.cam.2008.07.037
Projected Equations, Variational Inequalities, and Temporal Difference Methods, IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2009. ,
Q-Learning Algorithms for Optimal Stopping Based on Least Squares, Proceedings of European Control Conference, 2007. ,
Kernel-Based Least Squares Policy Iteration for Reinforcement Learning, IEEE Transactions on Neural Networks, vol.18, issue.4, pp.973-992, 2007. ,
DOI : 10.1109/TNN.2007.899161
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.459.4489
Kernelizing LSPE(?), IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp.338-345, 2007. ,
DOI : 10.1109/adprl.2007.368208
Incremental Least- Squares Temporal Difference Learning, 21st Conference of American Association for Artificial Intelligence (AAAI 06), pp.356-361, 2006. ,
Regularized policy iteration, 22nd Annual Conference on Neural Information Processing Systems (NIPS 21), 2008. ,
Regularization and feature selection in least-squares temporal difference learning, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009. ,
DOI : 10.1145/1553374.1553442