direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

A Brief Look on Multi-Armed Bandit Problem and Its Engineering Applications

Setareh Maghsudi (TU Berlin)


Multi-armed bandit (MAB) is a class of sequential optimization problems, where, given a set of arms, a player pulls an arm at each round in order to receive some reward, drawn from the unknown reward generating process of that arm. In such an uncertain setting, at each round, the player may lose some reward due to not selecting the best arm instead of the played arm. This loss is referred to as regret. The player decides which arm to pull in the sequence of rounds so that its average accumulated regret over the horizon is minimized, or its discounted reward over the horizon is maximized. MAB enjoys a variety of formulations, and has proved its great potential as a mathematical tool to address a wide range of engineering problems.

In this talk, I first present a brief tutorial on multi-armed bandits, including basic notions and various settings. Moreover, some results with respect to multi-agent setting will be provided. Afterwards, I review some engineering applications as well as future research directions.


Setareh Maghsudi
TEL 512

Back to the research colloquium site.

To top

Zusatzinformationen / Extras