Schedule | ARLET

Time	Type	Title & Speakers
9:00 a.m.	Opening Remarks [video]	Alberto Metelli (Politecnico di Milano)
9:05 a.m.	Invited Talk [slides] [video]	The Rise of Reinforcement Learning: from One to Many Niao He (ETH Zurich) Reinforcement learning (RL), combined with deep neural networks, is key to the boom of recent AI breakthroughs from game mastery to control automation. However, their successes are overly reliant on brute-force computing power and engineering tricks, leaving wide gaps between practice and theory. The lack of theoretical foundations is even more pronounced as we shift from single-agent to many-agent RL, in addressing complex dynamic systems and decision making such as resource allocation, traffic management, and social interaction. The challenges inherent in learning many-agent systems stem not only from the increased computational and strategic complexities but also from practical limitations in coordination and exploration. In this talk, I will shed light on promising principles that break the curses of many-agent RL, focusing on mean-field approximation theory, statistical complexity, and independent learning. This will further pave the way for scalable and principled solutions to unlock the full potential of RL for next-generation AI.
10:00 a.m.	Invited Talk [slides] [video]	Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning Dylan Foster (Microsoft Research) Imitation learning (IL) aims to mimic the behavior of an expert in a sequential decision making task by learning from demonstrations, and has been widely applied to robotics, autonomous driving, and autoregressive language generation. The simplest approach to IL, behavior cloning (BC), is thought to incur sample complexity with unfavorable quadratic dependence on the problem horizon, motivating a variety of different online algorithms that attain improved linear horizon dependence under stronger assumptions on the data and the learner’s access to the expert. In this talk, we revisit the apparent gap between offline and online IL from a learning-theoretic perspective, with a focus on general policy classes up to and including deep neural networks. Through a new analysis of behavior cloning with the logarithmic loss, we will show that it is possible to achieve horizon-independent sample complexity in offline IL whenever (i) the range of the cumulative payoffs is controlled, and (ii) an appropriate notion of supervised learning complexity for the policy class is controlled. When specialized to stationary policies, this implies that the gap between offline and online IL is not fundamental. We will then discuss implications of this result and investigate the extent to which it bears out empirically.
10:45 a.m.		Break
11:00 a.m.	Invited Talk [slides] [video]	Reinforcement Learning at the Hyperscale Jakob Foerster (University of Oxford) Deep reinforcement learning is currently undergoing a revolution of scale, fuelled by jointly running the environment, data collection, and training loop on the GPU, which has resulted in orders of magnitude of speed-up for many tasks. In this talk I start by presenting examples of our recent work which have been enabled by this revolution, spanning multi-agent RL, meta-learning, and environment discovery. I will end the talk by outlining failure modes of relying on GPU accelerated environments and possible paradigms for the community to collectively address them, ranging from promising research directions to novel evaluation protocols.
11:45 a.m.	Contributed Talks [video]	Is Value Learning Really the Main Bottleneck in Offline RL? Seohong Park (UC Berkeley) REBEL: Reinforcement Learning via Regressing Relative Rewards Gokul Swamy (Cornell University) Partially Observable Multi-Agent Reinforcement Learning using Mean Field Control Kai Cui (TU Darmstadt) Information Theoretic Guarantees For Policy Alignment in Large Language Models Youssef Mroueh (IBM Research)
12:25 p.m.		Lunch Break
1:25 p.m.		Poster Session 1
2:25 p.m.	Panel Discussion [video]	Moderator: Csaba Szepesvari (University of Alberta) Marcello Restelli (Politecnico di Milano), Sergey Levine (UC Berkeley), Akshay Krishnamurthy (Microsoft Research), Martha White (University of Alberta)
3:25 p.m.		Coffee Break
3:40 p.m.	Contributed Talks [video]	A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits Kwang-Sung Jun (University of Arizona) Transductive Active Learning with Application to Safe Bayesian Optimization Jonas Hübotter (ETH Zurich)
4:00 p.m.		Poster Session 2