Title: Sequential Decision Problems with Weak Feedback
Abstract: Many variants of sequential decision problems that are considered in the literature depend upon the type of feedback and the amount of information they reveal about the associated rewards. Most of the prior work studied the cases where feedback from actions reveals rewards associated with the actions. However, in many areas like crowd-sourcing, medical diagnosis, and adaptive resource allocation, feedback from actions may be weak, i.e., may not reveal any information about rewards at all. Without any information about rewards, it is not possible to learn which action is optimal. Clearly, learning an optimal action is only possible if the problem structure is such that an optimal action can be identified without explicitly knowing the rewards. Our goal is to study the class of problems where optimal action can be inferred without explicitly knowing the rewards. Specifically, we study Unsupervised Sequential Selection (USS), where rewards/losses for selected actions are never revealed, but the problem structure is amenable to identify the optimal actions. We also introduce a novel setup named Censored Semi-Bandits (CSB), where the reward observed from an action depends on the amount of resources allocated to it. We develop provably optimal algorithms for the USS and CSB problems and validate their empirical performance on different problem instances derived from synthetic and real datasets.
This talk is based on the following papers:
1. Arun Verma, Manjesh K. Hanawal, Csaba Szepesvari, and Venkatesh Saligrama, 'Online Algorithm for Unsupervised Sensor Selection,' AISTATS 2019.
2. Arun Verma, Manjesh K. Hanawal, and N. Hemachandra, 'Thompson Sampling for Unsupervised Sequential Selection,' ACML 2020.
3. Arun Verma, Manjesh K. Hanawal, Csaba Szepesvari, and Venkatesh Saligrama, 'Online Algorithm for Unsupervised Sequential Selection with Contextual Information,' NeurIPS 2020.
4. Arun Verma, Manjesh K. Hanawal, Arun Rajkumar, and Raman Sankaran, 'Censored Semi- Bandits: A Framework for Resource Allocation with Censored Feedback,' NeurIPS 2019.
5. Arun Verma and Manjesh K. Hanawal, 'Stochastic Network Utility Maximization with Unknown Utility: Multi-Armed Bandits Approach,' IEEE INFOCOM 2020.
Public events of RIKEN Center for Advanced Intelligence Project (AIP)Join community