Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol.4, issue.1, 2010. ,
DOI : 10.2200/S00268ED1V01Y201005AIM009
Reinforcement Learning: State Of the Art, 2012. ,
DOI : 10.1007/978-3-642-27645-3
Reinforcement Learning and Dynamic Programming Using Function Approximators, 2010. ,
Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2007. ,
DOI : 10.1002/9781118029176
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning, Discrete Event Dynamic Systems, vol.22, issue.1,2,3, pp.207-239, 2006. ,
DOI : 10.1007/b98840
Residual Algorithms: Reinforcement Learning with Function Approximation, International Conference on Machine Learning (ICML), pp.30-37, 1995. ,
DOI : 10.1016/B978-1-55860-377-6.50013-X
URL : http://www.cs.cmu.edu/People/reinf/ml95/proc/baird.ps
Algorithms and Representations for Reinforcement Learning, 2005. ,
Kalman Temporal Differences, Journal of Artificial Intelligence Research, 2010. ,
DOI : 10.1109/adprl.2009.4927543
URL : https://hal.archives-ouvertes.fr/hal-00858687
Linear Least-Squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996. ,
DOI : 10.1007/bf00114723
URL : https://link.springer.com/content/pdf/10.1007%2FBF00114723.pdf
Statistically linearized least-squares temporal differences, International Congress on Ultra Modern Telecommunications and Control Systems ,
DOI : 10.1109/ICUMT.2010.5676598
URL : https://hal.archives-ouvertes.fr/hal-00554338
Fast gradient-descent methods for temporal-difference learning with linear function approximation, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.993-1000, 2009. ,
DOI : 10.1145/1553374.1553501
URL : http://webdocs.cs.ualberta.ca/~sutton/papers/gradTD1.pdf
Convergent temporal-difference learning with arbitrary smooth function approximation, Advances in Neural Information Processing Systems (NIPS), pp.1204-1212, 2009. ,
GQ( ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces, Proceedings of the 3d Conference on Artificial General Intelligence (AGI-10), 2010. ,
DOI : 10.2991/agi.2010.22
Toward Off-Policy Learning Control with Function Approximation, International Conference on Machine Learning (ICML), 2010. ,
Least Squares Policy Evaluation Algorithms with Linear Function Approximation, Discrete Event Dynamic Systems: Theory and Applications, pp.79-110, 2003. ,
Q-Learning Algorithms for Optimal Stopping Based on Least Squares, European Control Conference, 2007. ,
Stable Function Approximation in Dynamic Programming, International Conference on Machine Learning (IMCL), 1995. ,
DOI : 10.1016/B978-1-55860-377-6.50040-2
URL : http://www.cs.berkeley.edu/~pabbeel/cs287-fa09/readings/Gordon-1995.pdf
Tree-Based Batch Mode Reinforcement Learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2005. ,
Learning to predict by the methods of temporal differences, Machine Learning, pp.9-44, 1988. ,
DOI : 10.3758/BF03205056
Online q-learning using connectionist systems, 1994. ,
Q-learning, Machine Learning, pp.279-292, 1992. ,
DOI : 10.1007/BF00992698
Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning, International Conference on Machine Learning (ICML), pp.154-161, 2003. ,
An analysis of temporaldifference learning with function approximation, IEEE Transactions on Automatic Control, vol.42, 1997. ,
Temporal difference learning and TD-Gammon, Communications of the ACM, vol.38, issue.3, 1995. ,
DOI : 10.1145/203330.203343
A theoretical and empirical analysis of Expected Sarsa, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2009. ,
DOI : 10.1109/ADPRL.2009.4927542
Least Squares Temporal Difference Methods: An Analysis under General Conditions, SIAM Journal on Control and Optimization, vol.50, issue.6, 2010. ,
DOI : 10.1137/100807879
An analysis of reinforcement learning with function approximation, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.664-671, 2009. ,
DOI : 10.1145/1390156.1390240
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, pp.89-129, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00830201
On Fréchet subdifferentials, Journal of Mathematical Sciences, vol.116, issue.3, pp.3325-3358, 2003. ,
DOI : 10.1023/A:1023673105317
An Introduction to Multivariate Statistical Analysis, 1984. ,
Reinforcement learning with Gaussian processes, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005. ,
DOI : 10.1145/1102351.1102377
URL : http://www-ee.technion.ac.il/~rmeir/Publications/EngelMannorMeirICML05.pdf
Eligibility Traces for Off-Policy Policy Evaluation, International Conference on Machine Learning (ICML), pp.759-766, 2000. ,
New extension of the Kalman filter to nonlinear systems, Signal Processing, Sensor Fusion, and Target Recognition VI, 1997. ,
DOI : 10.1117/12.280797
The scaled unscented transformation, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301), pp.4555-4559, 2002. ,
DOI : 10.1109/ACC.2002.1025369
URL : http://www.cs.unc.edu/~welch/kalman/media/pdf/ACC02-IEEE1357.PDF
New developments in state estimation for nonlinear systems, Automatica, vol.36, issue.11, pp.1627-1638, 2000. ,
DOI : 10.1016/S0005-1098(00)00089-3
Sigma-point kalman filters for probabilistic inference in dynamic state-space models, 2004. ,
Eligibility traces through colored noises, International Congress on Ultra Modern Telecommunications and Control Systems, 2010. ,
DOI : 10.1109/ICUMT.2010.5676597
URL : https://hal.archives-ouvertes.fr/hal-00553910
Uncertainty management for online optimisation of a POMDP-based large-scale spoken dialogue system, Annual Conference of the International Speech Communication Association (Interspeech), pp.1301-1304, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00652194
Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003. ,
Instrumental variable methods for system identification, Circuits, Systems, and Signal Processing, pp.1-9, 2002. ,
DOI : 10.1007/BFb0009019
Incremental Least-Squares Temporal Difference Learning, Conference of American Association for Artificial Intelligence (AAAI), pp.356-361, 2006. ,
Hybrid least-squares algorithms for approximate policy evaluation, Machine learning, 2009. ,
DOI : 10.1007/978-3-642-04180-8_9
URL : http://www.cs.duke.edu/%7Ejohns/pubs/johns_ml09.pdf
Statistically linearized recursive least squares, 2010 IEEE International Workshop on Machine Learning for Signal Processing ,
DOI : 10.1109/MLSP.2010.5589236
URL : https://hal.archives-ouvertes.fr/hal-00553168
A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation, Advances in Neural Information Processing Systems (NIPS), 2008. ,
Stochastic Simulation, 1987. ,
DOI : 10.1002/9780470316726
Some studies in machine learning using the game of checkers, IBM Journal on Research and Development, pp.210-229, 1959. ,
Performance Bounds in Lp norm for Approximate Value Iteration, SIAM Journal on Control and Optimization, 2007. ,
DOI : 10.1137/040614384
URL : https://hal.archives-ouvertes.fr/inria-00124685
Kernel-Based Reinforcement Learning, Machine Learning, pp.161-178, 2002. ,
Neural Fitted Q Iteration -First Experiences with a Data Efficient Neural Reinforcement Learning Method Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming, European Conference on Machine Learning (ECML) Labs for Information and Decision Systems, MIT, Tech. Rep. LIDS-P-2349, 1996. ,
Learning and Approximate Dynamic Programming, ch. Improved Temporal Difference Methods with Linear Function Approximation, pp.231-235, 2004. ,
Projected equation methods for approximate solution of large linear systems, Journal of Computational and Applied Mathematics, vol.227, issue.1, pp.27-50, 2007. ,
DOI : 10.1016/j.cam.2008.07.037
Projected Equations, Variational Inequalities, and Temporal Difference Methods, IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2009. ,
The Linear Programming Approach to Approximate Dynamic Programming, Operations Research, vol.51, issue.6, pp.850-865, 2003. ,
DOI : 10.1287/opre.51.6.850.24925
The Smoothed Approximate Linear Program, Advances in Neural Information Processing Systems (NIPS), 2009. ,
Approximate Modified Policy Iteration, International Conference on Machine Learning (ICML), 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00697169
Approximately optimal approximate reinforcement learning, International Conference on Machine Learning (ICML), 2002. ,
Neuronlike adaptive elements that can solve difficult learning control problems, IEEE Transactions on Systems, Man, and Cybernetics, vol.13, issue.5, pp.834-846, 1983. ,
DOI : 10.1109/TSMC.1983.6313077
OnActor-Critic Algorithms, SIAM Journal on Control and Optimization, vol.42, issue.4, pp.1143-1166, 2003. ,
DOI : 10.1137/S0363012901385691
Policy Gradient Methods for Reinforcement Learning with Function Approximation, Neural Information Processing Systems (NIPS), pp.1057-1063, 1999. ,
Natural Actor-Critic, Neurocomputing, vol.71, issue.7-9, pp.1180-1190, 2008. ,
DOI : 10.1016/j.neucom.2007.11.026
Incremental natural actor-critic algorithms, Advances in Neural Information Processing Systems (NIPS), 2007. ,
DOI : 10.1016/j.automatica.2009.07.008
URL : http://www.cs.ualberta.ca/~sutton/papers/BSGL-08.pdf
Revisiting Natural Actor-Critics with Value Function Approximation, International Conference on Modeling Decisions for Artificial Intelligence (MDAI), ser. LNAI, pp.207-218, 2010. ,
DOI : 10.1007/11596448_9
URL : https://hal.archives-ouvertes.fr/hal-00554346
Recursive Least-Squares Learning with Eligibility Traces, European Workshop on Reinforcement Learning, 2011. ,
DOI : 10.1007/978-3-642-29946-9_14
URL : https://hal.archives-ouvertes.fr/hal-00644511
Error Bounds for Approximate Policy Iteration, International Conference on Machine Learning (ICML), pp.560-567, 2003. ,
Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view, International Conference on Machine Learning (ICML), 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00537403
Finite-Sample Analysis of LSTD, International Conference on Machine Learning (ICML), 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00482189
Sample Efficient On-line Learning of Optimal Dialogue Policies with Kalman Temporal Differences, International Joint Conference on Artificial Intelligence (?CAI), pp.1878-1883, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00618252
Technical Update: Least-Squares Temporal Difference Learning, Machine Learning, pp.233-246, 1999. ,
Off-policy learning in large-scale POMDP-based dialogue systems, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012. ,
DOI : 10.1109/ICASSP.2012.6289040
URL : https://hal.archives-ouvertes.fr/hal-00684819
Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation, Proceedings of the 24th international conference on Machine learning, ICML '07, 2007. ,
DOI : 10.1145/1273496.1273591
URL : http://imls.engr.oregonstate.edu/www/htdocs/proceedings/icml2007/papers/523.pdf
Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes, PLoS Computational Biology, vol.35, issue.5, 2011. ,
DOI : 10.1371/journal.pcbi.1002055.t002
URL : https://doi.org/10.1371/journal.pcbi.1002055
Sample-efficient batch reinforcement learning for dialogue management optimization, ACM Transactions on Speech and Language Processing, vol.7, issue.3, pp.1-7, 2011. ,
DOI : 10.1145/1966407.1966412
URL : https://hal.archives-ouvertes.fr/hal-00617517
Bias-Variance Error Bounds for Temporal Difference Updates, Conference on Learning Theory (COLT), 2000. ,
Recursive least-squares off-policy learning with eligibility traces, INRIA, Tech. Rep, 2012. ,
DOI : 10.1007/978-3-642-29946-9_14
URL : http://hal.inria.fr/docs/00/64/45/11/PDF/ewrl.pdf
Kernel-Based Least Squares Policy Iteration for Reinforcement Learning, IEEE Transactions on Neural Networks, vol.18, issue.4, pp.973-992, 2007. ,
DOI : 10.1109/TNN.2007.899161
URL : http://www.jilsa.net/files/ieee-tnn-paper-04267723.pdf
Kernelizing LSPE(?), IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL), pp.338-345, 2007. ,
DOI : 10.1109/adprl.2007.368208
Kernelized value function approximation for reinforcement learning, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009. ,
DOI : 10.1145/1553374.1553504
URL : http://www.cs.mcgill.ca/~icml2009/papers/467.pdf
Sparse Temporal Difference Learning Using LASSO, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007. ,
DOI : 10.1109/ADPRL.2007.368210
URL : https://hal.archives-ouvertes.fr/inria-00117075
Regularization and feature selection in least-squares temporal difference learning, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009. ,
DOI : 10.1145/1553374.1553442
URL : http://www.cs.mcgill.ca/~icml2009/papers/439.pdf
Linear Complementarity for Regularized Policy Evaluation and Improvement, Advances in Neural Information Processing Systems (NIPS), pp.1009-1017, 2010. ,
???1-Penalized Projected Bellman Residual, European Workshop on Reinforcement Learning (EWRL), 2011. ,
DOI : 10.1007/978-3-642-29946-9_12
URL : http://hal.inria.fr/docs/00/64/45/07/PDF/gs_ewrl_l1_cr.pdf
Regularized Least Squares Temporal Difference Learning with Nested ???2 and ???1 Penalization, European Workshop on Reinforcement Learning (EWRL), 2011. ,
DOI : 10.1007/978-3-642-29946-9_13
A Dantzig Selector Approach to Temporal Difference Learning, International Conference on Machine Learning (ICML), 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00749480
Basis Function Adaptation in Temporal Difference Reinforcement Learning, Annals of Operations Research, vol.34, issue.1/2/3, pp.215-238, 2005. ,
DOI : 10.1109/TSMC.1983.6313077
URL : http://www.ee.technion.ac.il/people/shimkin/preprints/basisadaptation_dec03.pdf
Automatic basis function construction for approximate dynamic programming and reinforcement learning, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.449-456, 2006. ,
DOI : 10.1145/1143844.1143901
URL : http://www.ece.mcgill.ca/~smanno1//public/C-KellerPrecup-NCAICML-2006.pdf
Analyzing feature generation for value-function approximation, Proceedings of the 24th international conference on Machine learning, ICML '07, 2007. ,
DOI : 10.1145/1273496.1273589
URL : http://www.cs.duke.edu/~parr/icml07.pdf
An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008. ,
DOI : 10.1145/1390156.1390251
Automatic Induction of Bellman-Error Features for Probabilistic Planning, Journal of Artificial Intelligence Research, 2010. ,
Proto-value functions, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.2169-2231, 2007. ,
DOI : 10.1145/1102351.1102421
Adaptive aggregation methods for infinite horizon dynamic programming, IEEE Transactions on Automatic Control, vol.34, issue.6, pp.589-598, 1989. ,
DOI : 10.1109/9.24227
Reinforcement learning with soft state aggregation, Advances in neural information processing systems (NIPS), pp.361-368, 1995. ,
Convergence Analysis of Kernel-based On-policy Approximate Policy Iteration Algorithms for Markov Decision Processes with Continuous, Multidimensional States and Actions, 2010. ,
Reinforcement learning using kernel-based stochastic factorization, Advances in Neural Information Processing Systems (NIPS), 2011. ,