A Brief Look on Multi-Armed Bandit Problem and Its Engineering Applications
Setareh Maghsudi (TU Berlin)
Multi-armed bandit (MAB) is a class of sequential optimization
problems, where, given a set of arms, a player pulls an arm at each
round in order to receive some reward, drawn from the unknown reward
generating process of that arm. In such an uncertain setting, at each
round, the player may lose some reward due to not selecting the best
arm instead of the played arm. This loss is referred to as regret. The
player decides which arm to pull in the sequence of rounds so that its
average accumulated regret over the horizon is minimized, or its
discounted reward over the horizon is maximized. MAB enjoys a variety
of formulations, and has proved its great potential as a mathematical
tool to address a wide range of engineering problems.
In this talk, I first present a brief tutorial on multi-armed bandits, including basic notions and various settings. Moreover, some results with respect to multi-agent setting will be provided. Afterwards, I review some engineering applications as well as future research directions.
Back to the research colloquium site.