Poster Sessions

Poster Session 1

  • Is Value Learning Really the Main Bottleneck in Offline RL?
  • REBEL: Reinforcement Learning via Regressing Relative Rewards
  • Partially Observable Multi-Agent Reinforcement Learning using Mean Field Control
  • Information Theoretic Guarantees For Policy Alignment In Large Language Models
  • An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models
  • Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent
  • When to Sense and Control? A Time-adaptive Approach for Continuous-Time RL
  • Transferable Reinforcement Learning via Generalized Occupancy Models
  • Learning to Steer Markovian Agents under Model Uncertainty
  • vMF-exp: von Mises-Fisher Exploration of Large Action Sets with Hyperspherical Embeddings
  • No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO
  • Adaptive Foundation Models for Online Decisions: HyperAgent with Fast Incremental Uncertainty Estimation
  • Efficient Offline Learning of Ranking Policies via Top-$k$ Policy Decomposition
  • Oracle-Efficient Reinforcement Learning for Max Value Ensembles
  • Policy Gradient Methods with Adaptive Policy Spaces
  • Functional Acceleration for Policy Mirror Descent
  • VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation
  • Multi-Agent Imitation Learning: Value is Easy, Regret is Hard
  • Batch Learning via Log-Sum-Exponential Estimator from Logged Bandit Feedback
  • Advantage Alignment Algorithms
  • Enhancing Actor-Critic Decision-Making with Afterstate Models for Continuous Control
  • In Search for Architectures and Loss Functions in Multi-Objective Reinforcement Learning
  • Reinforcement Learning from Bagged Reward
  • Combining Reconstruction and Contrastive Methods for Multimodal Representations in RL
  • Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning
  • Safe exploration in reproducing kernel Hilbert spaces
  • PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling
  • Towards Zero-Shot Generalization in Offline Reinforcement Learning
  • Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control
  • On the Theory of Risk-Aware Agents: Bridging Actor-Critic and Economics
  • BenchMARL: Benchmarking Multi-Agent Reinforcement Learning
  • Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts
  • Adaptive Two-Level Quasi-Monte Carlo for Soft Actor-Critic
  • Markov Persuasion Processes: How to Persuade Multiple Agents From Scratch
  • Delayed Adversarial Attacks on Stochastic Multi-Armed Bandits
  • RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
  • Locally Interdependent Multi-Agent MDP: Theoretical Framework for Decentralized Agents with Dynamic Dependencies
  • Idea Track: Leveraging Reinforcement Learning to Enhance Decision-Making in Oncology Treatments
  • Idea Track: Tight Bounds for Bernoulli Rewards in Kernelized Multi-Armed Bandits
  • Idea Track: Reward Estimation in Inverse Bandit Problems
  • Idea Track: Proper Hyper-parameter Optimization in Reinforcement Learning



Poster Session 2

  • A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits
  • Transductive Active Learning with Application to Safe Bayesian Optimization
  • Realtime Reinforcement Learning: Towards Rapid Asynchronous Deployment of Large Models
  • Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control Priors
  • Improved Algorithms for Adversarial Bandits with Unbounded Losses
  • Survive on Planet Pandora: Robust Cross-Domain RL Under Distinct State-Action Representations
  • A Case for Validation Buffer in Pessimistic Actor-Critic
  • Offline RL via Feature-Occupancy Gradient Ascent
  • How Does Return Distribution in Distributional Reinforcement Learning Help Optimization?
  • Towards the Transferability of Rewards Recovered via Regularized Inverse Reinforcement Learning
  • Exploiting Exogenous Structure for Sample-Efficient Reinforcement Learning
  • Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation Complexity
  • A Theoretical Framework for Partially-Observed Reward States in RLHF
  • Reinforcement Learning in the Wild with Maximum Likelihood-based Model Transfer
  • Dual Approximation Policy Optimization
  • Wind farm control with cooperative multi-agent reinforcement learning
  • Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm
  • Provable Partially Observable Reinforcement Learning with Privileged Information
  • Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
  • KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty
  • Batched fixed-confidence pure exploration for bandits with switching constraints
  • Offline Reinforcement Learning with Pessimistic Value Priors
  • The Importance of Online Data: Understanding Preference Fine-Tuning via Coverage
  • Generalized Linear Bandits with Limited Adaptivity
  • Coordination Failure in Cooperative Offline MARL
  • Efficient Offline Reinforcement Learning: The Critic is Critical
  • EMPO: A Clustering-Based On-Policy Algorithm for Offline Reinforcement Learing
  • Quantized Representations Prevent Dimensional Collapse in Self-predictive RL
  • A Tractable Inference Perspective of Offline RL
  • Risk-Aware Bandits for Best Crop Management
  • Decoupled Stochastic Gradient Descent for N-Player Games
  • ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
  • Misspecified $Q$-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error
  • Handling Delay in Reinforcement Learning Caused by Parallel Computations of Neurons
  • Accelerated Online Reinforcement Learning using Auxiliary Start State Distributions
  • Reweighted Bellman Targets for Continual Reinforcement Learning
  • Should You Trust DQN?
  • Bisimulation Metrics are Optimal Transport Distances, and Can be Computed Efficiently
  • Idea Track: Improving Sample Efficiency in World Models through Semantic Exploration via Expert Demonstration
  • Idea Track: Active Representation Learning
  • Idea Track: Better Gradient Steps for Deep On-Policy Reinforcement Learning