A multiplicative UCB strategy for Gamma rewards - UMI 2958 - Axe de recherche : Computer Science Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

A multiplicative UCB strategy for Gamma rewards

Résumé

We consider the stochastic multi-armed bandit problem where rewards are distributed according to Gamma probability measures (unknown up to a lower bound on the form factor). To handle this problem, we propose an UCB-like strategy where indexes are multiplicative (sampled mean times a scaling factor). An upper-bound for the associated regret is provided and the proposed strategy is illustrated on some simple experiments.
Fichier principal
Vignette du fichier
gamma_ucb.pdf (481.8 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01258820 , version 1 (19-01-2016)

Identifiants

  • HAL Id : hal-01258820 , version 1

Citer

Matthieu Geist. A multiplicative UCB strategy for Gamma rewards. European Workshop on Reinforcement Learning, 2015, Lille, France. ⟨hal-01258820⟩
105 Consultations
140 Téléchargements

Partager

Gmail Facebook X LinkedIn More