Poster Session 1

Is Value Learning Really the Main Bottleneck in Offline RL?
REBEL: Reinforcement Learning via Regressing Relative Rewards
Partially Observable Multi-Agent Reinforcement Learning using Mean Field Control
Information Theoretic Guarantees For Policy Alignment In Large Language Models
An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models
Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent
When to Sense and Control? A Time-adaptive Approach for Continuous-Time RL
Transferable Reinforcement Learning via Generalized Occupancy Models
Learning to Steer Markovian Agents under Model Uncertainty
vMF-exp: von Mises-Fisher Exploration of Large Action Sets with Hyperspherical Embeddings
No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO
Adaptive Foundation Models for Online Decisions: HyperAgent with Fast Incremental Uncertainty Estimation
Efficient Offline Learning of Ranking Policies via Top-$k$ Policy Decomposition
Oracle-Efficient Reinforcement Learning for Max Value Ensembles
Policy Gradient Methods with Adaptive Policy Spaces
Functional Acceleration for Policy Mirror Descent
VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation
Multi-Agent Imitation Learning: Value is Easy, Regret is Hard
Batch Learning via Log-Sum-Exponential Estimator from Logged Bandit Feedback
Advantage Alignment Algorithms
Enhancing Actor-Critic Decision-Making with Afterstate Models for Continuous Control
In Search for Architectures and Loss Functions in Multi-Objective Reinforcement Learning
Reinforcement Learning from Bagged Reward
Combining Reconstruction and Contrastive Methods for Multimodal Representations in RL
Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning
Safe exploration in reproducing kernel Hilbert spaces
PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling
Towards Zero-Shot Generalization in Offline Reinforcement Learning
Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control
On the Theory of Risk-Aware Agents: Bridging Actor-Critic and Economics
BenchMARL: Benchmarking Multi-Agent Reinforcement Learning
Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts
Adaptive Two-Level Quasi-Monte Carlo for Soft Actor-Critic
Markov Persuasion Processes: How to Persuade Multiple Agents From Scratch
Delayed Adversarial Attacks on Stochastic Multi-Armed Bandits
RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
Locally Interdependent Multi-Agent MDP: Theoretical Framework for Decentralized Agents with Dynamic Dependencies
Idea Track: Leveraging Reinforcement Learning to Enhance Decision-Making in Oncology Treatments
Idea Track: Tight Bounds for Bernoulli Rewards in Kernelized Multi-Armed Bandits
Idea Track: Reward Estimation in Inverse Bandit Problems
Idea Track: Proper Hyper-parameter Optimization in Reinforcement Learning

Poster Session 2

A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits
Transductive Active Learning with Application to Safe Bayesian Optimization
Realtime Reinforcement Learning: Towards Rapid Asynchronous Deployment of Large Models
Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control Priors
Improved Algorithms for Adversarial Bandits with Unbounded Losses
Survive on Planet Pandora: Robust Cross-Domain RL Under Distinct State-Action Representations
A Case for Validation Buffer in Pessimistic Actor-Critic
Offline RL via Feature-Occupancy Gradient Ascent
How Does Return Distribution in Distributional Reinforcement Learning Help Optimization?
Towards the Transferability of Rewards Recovered via Regularized Inverse Reinforcement Learning
Exploiting Exogenous Structure for Sample-Efficient Reinforcement Learning
Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation Complexity
A Theoretical Framework for Partially-Observed Reward States in RLHF
Reinforcement Learning in the Wild with Maximum Likelihood-based Model Transfer
Dual Approximation Policy Optimization
Wind farm control with cooperative multi-agent reinforcement learning
Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm
Provable Partially Observable Reinforcement Learning with Privileged Information
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty
Batched fixed-confidence pure exploration for bandits with switching constraints
Offline Reinforcement Learning with Pessimistic Value Priors
The Importance of Online Data: Understanding Preference Fine-Tuning via Coverage
Generalized Linear Bandits with Limited Adaptivity
Coordination Failure in Cooperative Offline MARL
Efficient Offline Reinforcement Learning: The Critic is Critical
EMPO: A Clustering-Based On-Policy Algorithm for Offline Reinforcement Learing
Quantized Representations Prevent Dimensional Collapse in Self-predictive RL
A Tractable Inference Perspective of Offline RL
Risk-Aware Bandits for Best Crop Management
Decoupled Stochastic Gradient Descent for N-Player Games
ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
Misspecified $Q$-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error
Handling Delay in Reinforcement Learning Caused by Parallel Computations of Neurons
Accelerated Online Reinforcement Learning using Auxiliary Start State Distributions
Reweighted Bellman Targets for Continual Reinforcement Learning
Should You Trust DQN?
Bisimulation Metrics are Optimal Transport Distances, and Can be Computed Efficiently
Idea Track: Improving Sample Efficiency in World Models through Semantic Exploration via Expert Demonstration
Idea Track: Active Representation Learning
Idea Track: Better Gradient Steps for Deep On-Policy Reinforcement Learning