Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5 ,
DOI : 10.1109/TNN.1998.712192
Reinforcement learning for spoken dialogue systems, Proc. NIPS'99, 1999. ,
A stochastic model of human-machine interaction for learning dialog strategies, IEEE Transactions on Speech and Audio Processing, vol.8, issue.1, pp.11-23, 2000. ,
DOI : 10.1109/89.817450
A probabilistic framework for dialog simulation and optimal strategy learning, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.2, pp.589-599, 2006. ,
DOI : 10.1109/TSA.2005.855836
URL : https://hal.archives-ouvertes.fr/hal-00207952
The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management, Computer Speech & Language, vol.24, issue.2, pp.150-174, 2010. ,
DOI : 10.1016/j.csl.2009.04.001
URL : https://hal.archives-ouvertes.fr/hal-00598186
User modeling for spoken dialogue system evaluation, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, 1997. ,
DOI : 10.1109/ASRU.1997.658991
A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies, The Knowledge Engineering Review, vol.21, issue.02, pp.97-126, 2006. ,
DOI : 10.1017/S0269888906000944
Effects of the user model on simulation-based learning of dialogue strategies, Proceedings of ASRU'05, 2005. ,
Sample-efficient batch reinforcement learning for dialogue management optimization, ACM Transactions on Speech and Language Processing, vol.7, issue.3, 2011. ,
DOI : 10.1145/1966407.1966412
URL : https://hal.archives-ouvertes.fr/hal-00617517
Reinforcement learning with Gaussian processes, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005. ,
DOI : 10.1145/1102351.1102377
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.5939
Gaussian processes for fast policy optimisation of POMDP-based dialogue managers, Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp.201-204, 2010. ,
Gaussian Processes in Machine Learning, 2006. ,
DOI : 10.1162/089976602317250933
Gaussian process dynamic programming, Neurocomputing, vol.72, issue.7-9, pp.1508-1524, 2009. ,
DOI : 10.1016/j.neucom.2008.12.019
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.363.6558
Learning in embedded systems, 1993. ,
An analysis of model-based Interval Estimation for Markov Decision Processes, Journal of Computer and System Sciences, vol.74, issue.8, 2006. ,
DOI : 10.1016/j.jcss.2007.08.009
Near-Bayesian exploration in polynomial time, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009. ,
DOI : 10.1145/1553374.1553441
Agenda-based user simulation for bootstrapping a POMDP dialogue system, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers on XX, NAACL '07, pp.149-152, 2007. ,
DOI : 10.3115/1614108.1614146
Kalman Temporal Differences, Journal of Artificial Intelligence Research, vol.39, pp.483-532, 2010. ,
DOI : 10.1109/adprl.2009.4927543
URL : https://hal.archives-ouvertes.fr/hal-00351297
Sample efficient on-line learning of optimal dialogue policies with kalman temporal differences, International Joint Conference on Artificial Intelligence, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00618252