Sequential Decision Making Team Seminar (Talk by Canzhe Zhao, Shanghai Jiao Tong University).
This is an online seminar. Registration is required.
【Sequential Decision Making Team】
【Date】2025/October 24 (Fri) 13:00-14:00(JST)
【Speaker】Canzhe Zhao, Shanghai Jiao Tong University, Department of Computer Science and Engineering
Title: Scalable Online Learning in Adversarial Environments: from Single-Agent to Multi-Agent
Abstract:Practical applications of sequential decision-making in complex and dynamic environments face critical challenges, including the curse of dimensionality and adversarial loss functions. In this talk, I will present a unified research program on scalable online learning in adversarial environments, addressing these core challenges from both single-agent reinforcement learning (RL) and multi-agent gametheoretic perspectives. The first part of the talk focuses on adversarial bandits and RL with function approximation. I will introduce our advances on learning in adversarial linear mixture MDPs and low-rank MDPs. In addition, I will present our best-of-both-worlds algorithms for linear bandits, which achieve (nearly) optimal regret in both stochastic and adversarial environments, even under heavy-tailed noise distributions. The second part of the talk extends to partially observable Markov games (POMGs). I will present the first algorithm achieving last-iterate convergence in POMGs under bandit feedback, alongside pioneering algorithms for
learning POMGs with linear function approximation. These algorithms enable scalable and efficient learning in high-dimensional game environments.
Collectively, these advancements demonstrate how principled algorithmic designs can overcome fundamental limitations in online learning, leading to scalable and robust decision-making in complex and dynamic environments. The contributions presented in this talk have been published in premier machine learning venues, including ICML, ICLR, NeurIPS, UAI, and AAAI.