Sample-Efficient Batch Reinforcement Learning for Dialogue Management Optimization

Olivier Pietquin; Matthieu Geist; Senthilkumar Chandramohan; Hervé Frezza-Buet

doi:10.1145/1966407.1966412

Article Dans Une Revue ACM - Transactions on Speech and Language Processing Année : 2011

Sample-Efficient Batch Reinforcement Learning for Dialogue Management Optimization

(1, 2) , (1, 2) , (1, 2) , (1, 2)

1
2

Olivier Pietquin

Fonction : Auteur
PersonId : 4024
IdHAL : olivier-pietquin
ORCID : 0000-0002-5386-465X
IdRef : 142821861

SUPELEC-Campus Metz

Georgia Tech Lorraine [Metz]

Matthieu Geist

Fonction : Auteur
PersonId : 6945
IdHAL : matthieu-geist

SUPELEC-Campus Metz

Georgia Tech Lorraine [Metz]

Senthilkumar Chandramohan

Fonction : Auteur
PersonId : 888330

SUPELEC-Campus Metz

Georgia Tech Lorraine [Metz]

Hervé Frezza-Buet

Fonction : Auteur
PersonId : 7778
IdHAL : herve-frezza-buet
IdRef : 154600733

SUPELEC-Campus Metz

Georgia Tech Lorraine [Metz]

Résumé

Spoken Dialogue Systems (SDS) are systems which have the ability to interact with human beings using natural language as the medium of interaction. A dialogue policy plays a crucial role in determining the functioning of the dialogue management module. Handcrafting the dialogue policy is not always an option, considering the complexity of the dialogue task and the stochastic behavior of users. In recent years approaches based on Reinforcement Learning (RL) for policy optimization in dialogue management have been proved to be an efficient approach for dialogue policy optimization. Yet most of the conventional RL algorithms are data intensive and demand techniques such as user simulation. Doing so, additional modeling errors are likely to occur. This paper explores the possibility of using a set of approximate dynamic programming algorithms for policy optimization in SDS. Moreover, these algorithms are combined to a method for learning a sparse representation of the value function. Experimental results show that these algorithms when applied to dialogue management optimization are particularly sample efficient, since they learn from few hundreds of dialogue examples. These algorithms learn in an off-policy manner, meaning that they can learn optimal policies with dialogue examples generated with a quite simple strategy. Thus they can learn good dialogue policies directly from data, avoiding user modeling errors.

Sébastien Van Luchene : Connectez-vous pour contacter le contributeur

https://centralesupelec.hal.science/hal-00617517

Soumis le : lundi 29 août 2011-13:51:41

Dernière modification le : jeudi 13 avril 2023-09:26:12

Dates et versions

hal-00617517 , version 1 (29-08-2011)

Identifiants

HAL Id : hal-00617517 , version 1
DOI : 10.1145/1966407.1966412

Citer

Olivier Pietquin, Matthieu Geist, Senthilkumar Chandramohan, Hervé Frezza-Buet. Sample-Efficient Batch Reinforcement Learning for Dialogue Management Optimization. ACM - Transactions on Speech and Language Processing, 2011, 7 (3), pp.art. 7 (1-21). ⟨10.1145/1966407.1966412⟩. ⟨hal-00617517⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

SUPELEC CNRS UNIV-FCOMTE CENTRALESUPELEC UMI-GTL

108 Consultations

0 Téléchargements

Sample-Efficient Batch Reinforcement Learning for Dialogue Management Optimization

Résumé

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager