Learning from demonstrations: Is it worth estimating a reward function?

Bilal Piot 1 Matthieu Geist 1 Olivier Pietquin 1
1 IMS - Equipe Information, Multimodalité et Signal
UMI2958 - Georgia Tech - CNRS [Metz], SUPELEC-Campus Metz
Abstract : This paper provides a comparative study between Inverse Reinforcement Learning (IRL) and Apprenticeship Learning (AL) reduced to classification. IRL and AL are two frameworks for the imitation learning problem where an agent tries to learn from demonstrations of an expert. In AL, the agent tries to learn the expert policy whereas in IRL, the agent tries to learn a reward which can explain the behavior of the expert. Then, the optimal policy regarding this reward is used to imitate the expert. One can wonder if it is worth estimating such a reward, or if estimating a policy is sufficient. This quite natural question has not really been addressed in the literature so far. We provide partial answers, both from a theoretical and empirical points of view.
Document type :
Conference papers
Complete list of metadatas

https://hal-supelec.archives-ouvertes.fr/hal-00916938
Contributor : Sébastien van Luchene <>
Submitted on : Wednesday, December 11, 2013 - 8:48:38 AM
Last modification on : Wednesday, July 31, 2019 - 4:18:03 PM

Identifiers

  • HAL Id : hal-00916938, version 1

Collections

Citation

Bilal Piot, Matthieu Geist, Olivier Pietquin. Learning from demonstrations: Is it worth estimating a reward function?. 1st Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2013), Oct 2013, Princeton, New Jersey, United States. ⟨hal-00916938⟩

Share

Metrics

Record views

201