A Dantzig Selector Approach to Temporal Difference Learning

Matthieu Geist 1 Bruno Scherrer 2 Alessandro Lazaric 3 Mohammad Ghavamzadeh 3
1 IMS - Equipe Information, Multimodalité et Signal
UMI2958 - Georgia Tech - CNRS [Metz], SUPELEC-Campus Metz
2 MAIA - Autonomous intelligent machine
Inria Nancy - Grand Est, LORIA - AIS - Department of Complex Systems, Artificial Intelligence & Robotics
3 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : LSTD is one of the most popular reinforcement learning algorithms for value function approximation. Whenever the number of samples is larger than the number of features, LSTD must be paired with some form of regularization. In particular, L1-regularization methods tends to perform feature selection by promoting sparsity and thus they are particularly suited in high-dimensional problems. Nonetheless, since LSTD is not a simple regression algorithm but it solves a fixed-point problem, the integration with L1-regularization is not straightforward and it might come with some drawbacks (see e.g., the P-matrix assumption for LASSO-TD). In this paper we introduce a novel algorithm obtained by integrating LSTD with the Dantzig Selector. In particular, we investigate the performance of the algorithm and its relationship with existing regularized approaches, showing how it overcomes some of the drawbacks of existing solutions.
Type de document :
Communication dans un congrès
John Langford and Joelle Pineau. ICML-12, Jun 2012, Edinburgh, United Kingdom. Omnipress, pp.1399-1406, 2012
Liste complète des métadonnées

https://hal-supelec.archives-ouvertes.fr/hal-00749480
Contributeur : Sébastien Van Luchene <>
Soumis le : mercredi 7 novembre 2012 - 15:57:28
Dernière modification le : jeudi 21 février 2019 - 10:52:49

Identifiants

  • HAL Id : hal-00749480, version 1

Citation

Matthieu Geist, Bruno Scherrer, Alessandro Lazaric, Mohammad Ghavamzadeh. A Dantzig Selector Approach to Temporal Difference Learning. John Langford and Joelle Pineau. ICML-12, Jun 2012, Edinburgh, United Kingdom. Omnipress, pp.1399-1406, 2012. 〈hal-00749480〉

Partager

Métriques

Consultations de la notice

757