An algorithmic Survey of Parametric Value Function Approximation

Matthieu Geist 1 Olivier Pietquin 2
2 IMS - Equipe Information, Multimodalité et Signal
UMI2958 - Georgia Tech - CNRS [Metz], SUPELEC-Campus Metz
Abstract : Reinforcement learning is a machine learning answer to the optimal control problem. It consists in learning an optimal control policy through interactions with the system to be controlled, the quality of this policy being quantified by the so-called value function. A recurrent subtopic of reinforcement learning is to compute an approximation of this value function when the system is too large for an exact representation. This survey reviews state-of-the-art methods for (parametric) value function approximation by grouping them into three main categories: bootstrapping, residual and projected fixed-point approaches. Related algorithms are derived by considering one of the associated cost functions and a specific minimization method, generally a stochastic gradient descent or a recursive least-squares approach.
Type de document :
Article dans une revue
IEEE Transactions on Neural Networks and Learning Systems, IEEE, 2013, 24 (6), pp.845-867. 〈10.1109/TNNLS.2013.2247418〉
Liste complète des métadonnées

Littérature citée [93 références]  Voir  Masquer  Télécharger

https://hal-supelec.archives-ouvertes.fr/hal-00869725
Contributeur : Sébastien Van Luchene <>
Soumis le : lundi 6 novembre 2017 - 17:33:08
Dernière modification le : lundi 23 avril 2018 - 14:36:03

Fichier

vfa_survey.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Matthieu Geist, Olivier Pietquin. An algorithmic Survey of Parametric Value Function Approximation. IEEE Transactions on Neural Networks and Learning Systems, IEEE, 2013, 24 (6), pp.845-867. 〈10.1109/TNNLS.2013.2247418〉. 〈hal-00869725〉

Partager

Métriques

Consultations de la notice

131

Téléchargements de fichiers

34