Minimax Optimal Algorithms for Adversarial Bandit Problem With Multiple Plays

We investigate the adversarial bandit problem with multiple plays under semi-bandit feedback. We introduce a highly efficient algorithm that asymptotically achieves the performance of the best switching <inline-formula><tex-math notation="LaTeX">m</tex-math></inline-fo...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on signal processing Vol. 67; no. 16; pp. 4383 - 4398
Main Authors:	Vural, Nuri Mert, Gokcesu, Hakan, Gokcesu, Kaan, Kozat, Suleyman S.
Format:	Journal Article
Language:	English
Published:	New York IEEE 15-08-2019 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Adversarial multi-armed bandit Algorithms Computational modeling Games individual sequence manner minimax optimal Minimax technique Multi-armed bandit problems multiple plays Performance gain Signal processing algorithms Statistical analysis Switches switching bandit Time complexity
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We investigate the adversarial bandit problem with multiple plays under semi-bandit feedback. We introduce a highly efficient algorithm that asymptotically achieves the performance of the best switching <inline-formula><tex-math notation="LaTeX">m</tex-math></inline-formula>-arm strategy with minimax optimal regret bounds. To construct our algorithm, we introduce a new expert advice algorithm for the multiple-play setting. By using our expert advice algorithm, we additionally improve the best-known high-probability bound for the multi-play setting by <inline-formula><tex-math notation="LaTeX">O(\sqrt{m})</tex-math></inline-formula>. Our results are guaranteed to hold in an individual sequence manner since we have no statistical assumption on the bandit arm gains. Through an extensive set of experiments involving synthetic and real data, we demonstrate significant performance gains achieved by the proposed algorithm with respect to the state-of-the-art algorithms.
ISSN:	1053-587X 1941-0476
DOI:	10.1109/TSP.2019.2928952