検索条件

キーワード
タグ
ツール
開催日
こだわり条件

タグ一覧

JavaScript
PHP
Java
Ruby
Python
Perl
Scala
Haskell
C言語
C言語系
Google言語
デスクトップアプリ
スマートフォンアプリ
プログラミング言語
U/UX
MySQL
RDB
NoSQL
全文検索エンジン
全文検索
Hadoop
Apache Spark
BigQuery
サーバ構成管理
開発サポートツール
テストツール
開発手法
BI
Deep Learning
自然言語処理
BaaS
PaaS
Iaas
Saas
クラウド
AI
Payment
クラウドソフトウェア
仮想化ソフトウェア
OS
サーバ監視
ネットワーク
WEBサーバ
開発ツール
テキストエディタ
CSS
HTML
WEB知識
CMS
WEBマーケティング
グラフィック
グラフィックツール
Drone
AR
マーケット知識
セキュリティ
Shell
IoT
テスト
Block chain
知識

Sequential Decision Making Team Seminar (Talk by Canzhe Zhao, Shanghai Jiao Tong University).

2025/10/24(金)
04:00〜05:00
Googleカレンダーに追加
参加者

2人/300人

主催:RIKEN AIP Public

Sequential Decision Making Team Seminar (Talk by Canzhe Zhao, Shanghai Jiao Tong University).
This is an online seminar. Registration is required.

【Sequential Decision Making Team】
【Date】2025/October 24 (Fri) 13:00-14:00(JST)
【Speaker】Canzhe Zhao, Shanghai Jiao Tong University, Department of Computer Science and Engineering

Title: Scalable Online Learning in Adversarial Environments: from Single-Agent to Multi-Agent

Abstract:Practical applications of sequential decision-making in complex and dynamic environments face critical challenges, including the curse of dimensionality and adversarial loss functions. In this talk, I will present a unified research program on scalable online learning in adversarial environments, addressing these core challenges from both single-agent reinforcement learning (RL) and multi-agent gametheoretic perspectives. The first part of the talk focuses on adversarial bandits and RL with function approximation. I will introduce our advances on learning in adversarial linear mixture MDPs and low-rank MDPs. In addition, I will present our best-of-both-worlds algorithms for linear bandits, which achieve (nearly) optimal regret in both stochastic and adversarial environments, even under heavy-tailed noise distributions. The second part of the talk extends to partially observable Markov games (POMGs). I will present the first algorithm achieving last-iterate convergence in POMGs under bandit feedback, alongside pioneering algorithms for
learning POMGs with linear function approximation. These algorithms enable scalable and efficient learning in high-dimensional game environments.
Collectively, these advancements demonstrate how principled algorithmic designs can overcome fundamental limitations in online learning, leading to scalable and robust decision-making in complex and dynamic environments. The contributions presented in this talk have been published in premier machine learning venues, including ICML, ICLR, NeurIPS, UAI, and AAAI.

Workship