Gossip-based distributed stochastic bandit algorithms

B. Szorenyi 1, 2 Róbert Busa-Fekete 2, 3 I. Hegedüs 2 R. Ormandi 2 Márk Jelasity 2 Balázs Kégl 4, 5
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
4 TAO - Machine Learning and Optimisation
CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LRI - Laboratoire de Recherche en Informatique
5 Appstat
LAL - Laboratoire de l'Accélérateur Linéaire, LRI - Laboratoire de Recherche en Informatique
Abstract : The multi-armed bandit problem has attracted remarkable attention in the machine learning community and many efficient algorithms have been proposed to handle the so-called exploitation-exploration dilemma in various bandit setups. At the same time, significantly less effort has been devoted to adapting bandit algorithms to particular architectures, such as sensor networks, multi-core machines, or peer-to-peer (P2P) environments, which could potentially speed up their convergence. Our goal is to adapt stochastic bandit algorithms to P2P networks. In our setup, the same set of arms is available in each peer. In every iteration each peer can pull one arm independently of the other peers, and then some limited communication is possible with a few random other peers. As our main result, we show that our adaptation achieves a linear speedup in terms of the number of peers participating in the network. More precisely, we show that the probability of playing a suboptimal arm at a peer in iteration t=Ω(logN) is proportional to 1/(Nt) where N denotes the number of peers. The theoretical results are supported by simulation experiments showing that our algorithm scales gracefully with the size of network.
Type de document :
Communication dans un congrès
Sanjoy Dasgupta and David McAllester. 30th International Conference on Machine Learning (ICML 2013), Jun 2013, Atlanta, United States. Acm Press, 28, pp.19-27, 2013
Liste complète des métadonnées

Littérature citée [14 références]  Voir  Masquer  Télécharger

http://hal.in2p3.fr/in2p3-00907406
Contributeur : Sabine Starita <>
Soumis le : jeudi 21 novembre 2013 - 11:23:05
Dernière modification le : jeudi 21 février 2019 - 10:52:49
Document(s) archivé(s) le : samedi 22 février 2014 - 04:33:40

Fichier

szorenyi13.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : in2p3-00907406, version 1

Collections

Citation

B. Szorenyi, Róbert Busa-Fekete, I. Hegedüs, R. Ormandi, Márk Jelasity, et al.. Gossip-based distributed stochastic bandit algorithms. Sanjoy Dasgupta and David McAllester. 30th International Conference on Machine Learning (ICML 2013), Jun 2013, Atlanta, United States. Acm Press, 28, pp.19-27, 2013. 〈in2p3-00907406〉

Partager

Métriques

Consultations de la notice

800

Téléchargements de fichiers

786