Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search

Abstract : Local Policy Search is a popular reinforcement learning approach for handling large state spaces. Formally, it searches locally in a parameterized policy space in order to maximize the associated value function averaged over some pre-defined distribution. The best one can hope in general from such an approach is to get a local optimum of this criterion. The first contribution of this article is the following surprising result: if the policy space is convex, any (approximate) local optimum enjoys a global performance guarantee. Unfortunately, the convexity assumption is strong: it is not satisfied by commonly used parameterizations and designing a parameterization that induces this property seems hard. A natural so-lution to alleviate this issue consists in deriving an algorithm that solves the local policy search problem using a boosting approach (constrained to the convex hull of the policy space). The resulting algorithm turns out to be a slight generalization of conservative policy iteration; thus, our second contribution is to highlight an original connection between local policy search and approximate dynamic pro-gramming.
Type de document :
Communication dans un congrès
ECMLPKDD 2014, Sep 2014, Nancy, France. Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 8726, pp.35 - 50, 2014, Lecture Notes in Computer Science. 〈10.1007/978-3-662-44845-8_3〉
Liste complète des métadonnées

Littérature citée [25 références]  Voir  Masquer  Télécharger

https://hal-supelec.archives-ouvertes.fr/hal-01086345
Contributeur : Sébastien Van Luchene <>
Soumis le : lundi 24 novembre 2014 - 08:38:59
Dernière modification le : jeudi 11 janvier 2018 - 06:21:19
Document(s) archivé(s) le : mercredi 25 février 2015 - 10:15:51

Fichier

supelec886.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Bruno Scherrer, Matthieu Geist. Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search. ECMLPKDD 2014, Sep 2014, Nancy, France. Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 8726, pp.35 - 50, 2014, Lecture Notes in Computer Science. 〈10.1007/978-3-662-44845-8_3〉. 〈hal-01086345〉

Partager

Métriques

Consultations de la notice

55

Téléchargements de fichiers

57