A multiplicative UCB strategy for Gamma rewards

Matthieu Geist

Communication Dans Un Congrès Année : 2015

A multiplicative UCB strategy for Gamma rewards

(1)

Matthieu Geist

Fonction : Auteur correspondant
PersonId : 6945
IdHAL : matthieu-geist

Connectez-vous pour contacter l'auteur

MAchine Learning and Interactive Systems

Résumé

We consider the stochastic multi-armed bandit problem where rewards are distributed according to Gamma probability measures (unknown up to a lower bound on the form factor). To handle this problem, we propose an UCB-like strategy where indexes are multiplicative (sampled mean times a scaling factor). An upper-bound for the associated regret is provided and the proposed strategy is illustrated on some simple experiments.

Mots clés

Stochastic multi-armed bandits Gamma-distributed rewards

Domaines

Apprentissage [cs.LG]

Fichier principal

gamma_ucb.pdf (481.8 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Matthieu GEIST : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01258820

Soumis le : mardi 19 janvier 2016-15:07:21

Dernière modification le : jeudi 9 mars 2023-10:30:17

Archivage à long terme le : vendredi 11 novembre 2016-12:40:27

Dates et versions

hal-01258820 , version 1 (19-01-2016)

Identifiants

HAL Id : hal-01258820 , version 1

Citer

Matthieu Geist. A multiplicative UCB strategy for Gamma rewards. European Workshop on Reinforcement Learning, 2015, Lille, France. ⟨hal-01258820⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CENTRALESUPELEC MALIS UMI-GTL

105 Consultations

140 Téléchargements

A multiplicative UCB strategy for Gamma rewards

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager