R. Sutton and A. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5
DOI : 10.1109/TNN.1998.712192

S. Singh, M. Kearns, D. Litman, and M. Walker, Reinforcement learning for spoken dialogue systems, Proc. NIPS'99, 1999.

E. Levin, R. Pieraccini, and W. Eckert, A stochastic model of human-machine interaction for learning dialog strategies, IEEE Transactions on Speech and Audio Processing, vol.8, issue.1, pp.11-23, 2000.
DOI : 10.1109/89.817450

O. Pietquin and T. Dutoit, A probabilistic framework for dialog simulation and optimal strategy learning, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.2, pp.589-599, 2006.
DOI : 10.1109/TSA.2005.855836

URL : https://hal.archives-ouvertes.fr/hal-00207952

S. Young, M. Gasic, S. Keizer, F. Mairesse, J. Schatzmann et al., The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management, Computer Speech & Language, vol.24, issue.2, pp.150-174, 2010.
DOI : 10.1016/j.csl.2009.04.001

URL : https://hal.archives-ouvertes.fr/hal-00598186

W. Eckert, E. Levin, and R. Pieraccini, User modeling for spoken dialogue system evaluation, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, 1997.
DOI : 10.1109/ASRU.1997.658991

J. Schatzmann, K. Weilhammer, M. Stuttle, and S. Young, A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies, The Knowledge Engineering Review, vol.21, issue.02, pp.97-126, 2006.
DOI : 10.1017/S0269888906000944

J. Schatzmann, M. N. Stuttle, K. Weilhammer, and S. Young, Effects of the user model on simulation-based learning of dialogue strategies, Proceedings of ASRU'05, 2005.

O. Pietquin, M. Geist, S. Chandramohan, and H. Frezza-buet, Sample-efficient batch reinforcement learning for dialogue management optimization, ACM Transactions on Speech and Language Processing, vol.7, issue.3, 2011.
DOI : 10.1145/1966407.1966412

URL : https://hal.archives-ouvertes.fr/hal-00617517

Y. Engel, S. Mannor, and R. Meir, Reinforcement learning with Gaussian processes, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005.
DOI : 10.1145/1102351.1102377

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.5939

M. Ga?i´ga?i´c, F. Jur?í?ek, S. Keizer, F. Mairesse, B. Thomson et al., Gaussian processes for fast policy optimisation of POMDP-based dialogue managers, Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp.201-204, 2010.

C. E. Rasmussen and C. K. Williams, Gaussian Processes in Machine Learning, 2006.
DOI : 10.1162/089976602317250933

M. Deisenroth, C. Rasmussen, and J. Peters, Gaussian process dynamic programming, Neurocomputing, vol.72, issue.7-9, pp.1508-1524, 2009.
DOI : 10.1016/j.neucom.2008.12.019

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.363.6558

L. P. Kaelbling, Learning in embedded systems, 1993.

A. L. Strehl and M. L. Littman, An analysis of model-based Interval Estimation for Markov Decision Processes, Journal of Computer and System Sciences, vol.74, issue.8, 2006.
DOI : 10.1016/j.jcss.2007.08.009

J. Z. Kolter and A. Y. Ng, Near-Bayesian exploration in polynomial time, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553441

J. Schatzmann, B. Thomson, K. Weilhammer, H. Ye, and S. Young, Agenda-based user simulation for bootstrapping a POMDP dialogue system, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers on XX, NAACL '07, pp.149-152, 2007.
DOI : 10.3115/1614108.1614146

M. Geist and O. Pietquin, Kalman Temporal Differences, Journal of Artificial Intelligence Research, vol.39, pp.483-532, 2010.
DOI : 10.1109/adprl.2009.4927543

URL : https://hal.archives-ouvertes.fr/hal-00351297

O. Pietquin, M. Geist, and S. Chandramohan, Sample efficient on-line learning of optimal dialogue policies with kalman temporal differences, International Joint Conference on Artificial Intelligence, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00618252