A Non-Parametric Approach to Approximate Dynamic Programming

Hadrien Glaude; Fadi Akrimi; Matthieu Geist; Olivier Pietquin

Communication Dans Un Congrès Année : 2011

A Non-Parametric Approach to Approximate Dynamic Programming

(1) , (1) , (1) , (1)

Hadrien Glaude

Fonction : Auteur
PersonId : 9894
IdHAL : hadrien-glaude
IdRef : 197825966

IMS : Information, Multimodalité & Signal

Fadi Akrimi

Fonction : Auteur

IMS : Information, Multimodalité & Signal

Matthieu Geist

Fonction : Auteur
PersonId : 6945
IdHAL : matthieu-geist

IMS : Information, Multimodalité & Signal

Olivier Pietquin

Fonction : Auteur
PersonId : 4024
IdHAL : olivier-pietquin
ORCID : 0000-0002-5386-465X
IdRef : 142821861

IMS : Information, Multimodalité & Signal

Résumé

Approximate Dynamic Programming (ADP) is a machine learning method aiming at learning an optimal control policy for a dynamic and stochastic system from a logged set of observed interactions between the system and one or several non-optimal controlers. It defines a class of particular Reinforcement Learning (RL) algorithms which is a general paradigm for learning such a control policy from interactions. ADP addresses the problem of systems exhibiting a state space which is too large to be enumerated in the memory of a computer. Because of this, approximation schemes are used to generalize estimates over continuous state spaces. Nevertheless, RL still suffers from a lack of scalability to multidimensional continuous state spaces. In this paper, we propose the use of the Locally Weighted Projection Regression (LWPR) method to handle this scalability problem. We prove the efficacy of our approach on two standard benchmarks modified to exhibit larger state spaces.

Domaines

Apprentissage [cs.LG]

Fichier principal

ICMLA_2011_HGFAMGOP.pdf (595.46 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Sébastien Van Luchene : Connectez-vous pour contacter le contributeur

https://centralesupelec.hal.science/hal-00652438

Soumis le : jeudi 15 décembre 2011-15:51:30

Dernière modification le : mardi 14 février 2023-04:20:22

Archivage à long terme le : lundi 5 décembre 2016-08:45:38

Dates et versions

hal-00652438 , version 1 (15-12-2011)

Identifiants

HAL Id : hal-00652438 , version 1

Citer

Hadrien Glaude, Fadi Akrimi, Matthieu Geist, Olivier Pietquin. A Non-Parametric Approach to Approximate Dynamic Programming. ICMLA 2011, Dec 2011, Honolulu, Hawaii, United States. pp.1-6. ⟨hal-00652438⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

SUPELEC CENTRALESUPELEC

151 Consultations

243 Téléchargements

A Non-Parametric Approach to Approximate Dynamic Programming

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager