Difference of Convex Functions Programming Applied to Control with Expert Data

Bilal Piot; Matthieu Geist; Olivier Pietquin

Pré-Publication, Document De Travail Année : 2017

Difference of Convex Functions Programming Applied to Control with Expert Data

(1, 2) , (3) , (2, 1)

1
2
3

Bilal Piot

Fonction : Auteur

DeepMind [London]

Sequential Learning

Matthieu Geist

Fonction : Auteur
PersonId : 6945
IdHAL : matthieu-geist

CentraleSupélec

Olivier Pietquin

Fonction : Auteur
PersonId : 4024
IdHAL : olivier-pietquin
ORCID : 0000-0002-5386-465X
IdRef : 142821861

Sequential Learning

DeepMind [London]

Résumé

This paper reports applications of Difference of Convex functions (DC) programming to Learning from Demonstrations (LfD) and Reinforcement Learning (RL) with expert data. This is made possible because the norm of the Optimal Bellman Residual (OBR), which is at the heart of many RL and LfD algorithms, is DC. Improvement in performance is demonstrated on two specific algorithms, namely Reward-regularized Classification for Apprenticeship Learning (RCAL) and Reinforcement Learning with Expert Demonstrations (RLED), through experiments on generic Markov Decision Processes (MDP), called Garnets.

Domaines

Machine Learning [stat.ML]

Fichier principal

1606.01128.pdf (499.02 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Matthieu GEIST : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01629653

Soumis le : lundi 6 novembre 2017-16:12:31

Dernière modification le : mercredi 24 janvier 2024-09:54:23

Dates et versions

hal-01629653 , version 1 (06-11-2017)

Identifiants

HAL Id : hal-01629653 , version 1

Citer

Bilal Piot, Matthieu Geist, Olivier Pietquin. Difference of Convex Functions Programming Applied to Control with Expert Data. 2017. ⟨hal-01629653⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CENTRALESUPELEC MALIS UMI-GTL CRISTAL INRIA2 CRISTAL-SEQUEL UNIV-LILLE

125 Consultations

129 Téléchargements

Difference of Convex Functions Programming Applied to Control with Expert Data

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager