Doorkeeper

High-dimensional Statistical Modeling Team Seminar (Talk by Dorian Baudry (CNRS/INRIA))

Mon, 04 Jul 2022 15:00 - 16:00 JST
Online Link visible to participants
Register

Registration is closed

Get invited to future events

Free admission

Description

Speaker: Dorian Baudry (CNRS/INRIA)

Title: Optimal Thompson Sampling Strategies for Support-Aware CVaR Bandits

Abstract:
In this presentation we will introduce a multi-arm bandit algorithm proposed in Baudry et al.
(2021). A multi-arm bandit is a sequential decision-making problem in which at different time steps
a learner: (1) selects an action, (2) observes a reward corresponding to this action, and (3)
updates her policy to choose future actions in order to maximize the expected sum of rewards. The
main difficulty is then to find a strategy with the right balance between exploration and
exploitation. Motivated by an application of bandits in agriculture, we consider a risk-aware
variant of this problem in which the quality of each action is evaluated by its Conditional Value at
Risk (CVaR) at some given quantile of the reward distribution. After describing the problem and
illustrating the potential applications in agriculture in the first part of the talk, we will
introduce the Bounded CVaR Thompson Sampling algorithm (B-CVTS), that we prove to be the first
asymptotically optimal algorithm for CVaR bandits for distributions with bounded support. We will
then showcase the main theorems and elements of analysis presented in the paper. Finally, we will
discuss the experiments we implemented using the Decision Support Systems for Agro-Technological
Transfer (DSSAT), illustrating empirically the benefit of Thompson Sampling approaches in a
realistic environment simulating a use-case in agriculture.
Link to the article: https://proceedings.mlr.press/v139/baudry21a.html

About this community

RIKEN AIP Public

RIKEN AIP Public

Public events of RIKEN Center for Advanced Intelligence Project (AIP)

Join community