Difference of Convex Functions Programming Applied to Control with Expert Data - UMI 2958 - Axe de recherche : Computer Science Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2017

Difference of Convex Functions Programming Applied to Control with Expert Data

Matthieu Geist
Olivier Pietquin

Résumé

This paper reports applications of Difference of Convex functions (DC) programming to Learning from Demonstrations (LfD) and Reinforcement Learning (RL) with expert data. This is made possible because the norm of the Optimal Bellman Residual (OBR), which is at the heart of many RL and LfD algorithms, is DC. Improvement in performance is demonstrated on two specific algorithms, namely Reward-regularized Classification for Apprenticeship Learning (RCAL) and Reinforcement Learning with Expert Demonstrations (RLED), through experiments on generic Markov Decision Processes (MDP), called Garnets.
Fichier principal
Vignette du fichier
1606.01128.pdf (499.02 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01629653 , version 1 (06-11-2017)

Identifiants

  • HAL Id : hal-01629653 , version 1

Citer

Bilal Piot, Matthieu Geist, Olivier Pietquin. Difference of Convex Functions Programming Applied to Control with Expert Data. 2017. ⟨hal-01629653⟩
125 Consultations
129 Téléchargements

Partager

Gmail Facebook X LinkedIn More