A multiplicative UCB strategy for Gamma rewards
Résumé
We consider the stochastic multi-armed bandit problem where rewards are distributed according to Gamma probability measures (unknown up to a lower bound on the form factor). To handle this problem, we propose an UCB-like strategy where indexes are multiplicative (sampled mean times a scaling factor). An upper-bound for the associated regret is provided and the proposed strategy is illustrated on some simple experiments.
Domaines
Apprentissage [cs.LG]
Origine : Fichiers produits par l'(les) auteur(s)
Loading...