R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

S. Singh, M. Kearns, D. Litman, and M. Walker, Reinforcement learning for spoken dialogue systems, Proc. NIPS'99, 1999.

E. Levin, R. Pieraccini, and W. Eckert, A stochastic model of humanmachine interaction for learning dialog strategies, IEEE TSAP, 2000.

O. Pietquin and T. Dutoit, A probabilistic framework for dialog simulation and optimal strategy learning, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.2, 2006.
DOI : 10.1109/TSA.2005.855836

URL : https://hal.archives-ouvertes.fr/hal-00207952

S. Young, M. Gasic, S. Keizer, F. Mairesse, J. Schatzmann et al., The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management, Computer Speech & Language, vol.24, issue.2, 2010.
DOI : 10.1016/j.csl.2009.04.001

URL : https://hal.archives-ouvertes.fr/hal-00598186

W. Eckert, E. Levin, and R. Pieraccini, User modeling for spoken dialogue system evaluation, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, 1997.
DOI : 10.1109/ASRU.1997.658991

J. Schatzmann, K. Weilhammer, M. Stuttle, and S. Young, A survey of statistical user simulation techniques for rl of dialogue management strategies, The Knowledge Engineering Review, 2006.

J. Schatzmann, M. N. Stuttle, K. Weilhammer, and S. Young, Effects of the user model on simulation-based learning of dialogue strategies, Proc. of ASRU'05, 2005.

M. Ga?i´ga?i´c, F. Jur?í?ek, S. Keizer, F. Mairesse, B. Thomson et al., Gaussian processes for fast policy optimisation of POMDPbased dialogue managers, Proc. of SIGDIAL 11, 2010.

F. Jurcicek, B. Thomson, S. Keizer, M. Gasic, F. Mairesse et al., Natural Belief-Critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems, Inter- speech'10, 2010.

J. Williams and S. Young, Scaling up POMDPs for dialogue management: the summary POMDP method, Proc. of ASRU, 2005.

M. Geist and O. Pietquin, Kalman Temporal Differences, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00351297

O. Pietquin, M. Geist, and S. Chandramohan, Sample Efficient Online Learning of Optimal Dialogue Policies with Kalman Temporal Differences, Proc. of IJCAI, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00618252

J. Williams and S. Young, Partially observable Markov decision processes for spoken dialog systems, Computer Speech & Language, vol.21, issue.2, 2007.
DOI : 10.1016/j.csl.2006.06.008

J. Schatzmann, B. Thomson, K. Weilhammer, H. Ye, and S. Young, Agenda-based user simulation for bootstrapping a POMDP dialogue system, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers on XX, NAACL '07, 2007.
DOI : 10.3115/1614108.1614146

M. Geist and O. Pietquin, Managing Uncertainty within the KTD Framework, Proc. of the AL&E workshop, p.JMLR C&WP, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00599636

L. Daubigney, M. Gasic, S. Chandramohan, M. Geist, O. Pietquin et al., Uncertainty management for on-line optimisation of a POMDP-based large-scale spoken dialogue system, Proc. of Interspeech, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00652194

J. Z. Kolter and A. Y. Ng, Near-Bayesian exploration in polynomial time, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553441

Y. Engel, S. Mannor, and R. Meir, Reinforcement learning with Gaussian processes, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005.
DOI : 10.1145/1102351.1102377