Learning from Demonstrations: Is It Worth Estimating a Reward Function? - UMI 2958 - Axe de recherche : Computer Science Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

Learning from Demonstrations: Is It Worth Estimating a Reward Function?

Résumé

This paper provides a comparative study between Inverse Reinforcement Learning (IRL) and Apprenticeship Learning (AL). IRL and AL are two frameworks, using Markov Decision Processes (MDP), which are used for the imitation learning problem where an agent tries to learn from demonstrations of an expert. In the AL Framework, the agent tries to learn the expert policy whereas in the IRL Framework, the agent tries to learn a reward which can explain the behavior of the expert. This reward is then optimized to imitate the expert. One can wonder if it is worth estimating such a reward, or if estimating a Policy is sufficient. This quite natural question has not really been addressed in the literature right now. We provide partial answers, both from a theoretical and empirical point of view.
Fichier principal
Vignette du fichier
worth_estimating_reward.pdf (444.39 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00869801 , version 1 (06-11-2017)

Identifiants

Citer

Bilal Piot, Matthieu Geist, Olivier Pietquin. Learning from Demonstrations: Is It Worth Estimating a Reward Function?. Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD 2013), Sep 2013, Prague, Czech Republic. pp.17-32, ⟨10.1007/978-3-642-40988-2_2⟩. ⟨hal-00869801⟩

Collections

SUPELEC
190 Consultations
127 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More