Theoretical Foundations of Reinforcement Learning Workshop

Introduction

In many settings such as education, healthcare, drug design, robotics, transportation, and achieving better-than-human performance in strategic games, it is important to make decisions sequentially. This poses two interconnected algorithmic and statistical challenges: effectively exploring to learn information about the underlying dynamics and effectively planning using this information. Reinforcement Learning (RL) is the main paradigm tackling both of these challenges simultaneously which is essential in the aforementioned applications. Over the last years, reinforcement learning has seen enormous progress both in solidifying our understanding on its theoretical underpinnings and in applying these methods in practice.

This workshop aims to highlight recent theoretical contributions, with an emphasis on addressing significant challenges on the road ahead. Such theoretical understanding is important in order to design algorithms that have robust and compelling performance in real-world applications. As part of the ICML 2020 conference, this workshop will be held virtually. It will feature keynote talks from six reinforcement learning experts tackling different significant facets of RL. It will also offer the opportunity for contributed material (see below the call for papers and our outstanding program committee). The authors of each accepted paper will prerecord a 10-minute presentation and will also appear in a poster session. Finally, the workshop will have a panel discussing important challenges in the road ahead.

Contributed Papers

Early Poster Session

(10) Provable Hierarchical Imitation Learning via EM
Zhiyu Zhang, Ioannis Paschalidis
[video]
(12) Multi-Task Reinforcement Learning as a Hidden-Parameter Block MDP
Amy Zhang, Shagun Sodhani, Khimya Khetarpal, Joelle Pineau
[arXiv]
(14) Sample Complexity of Estimating the Policy Gradient for Nearly Deterministic Dynamical Systems
Osbert Bastani
[arXiv]
(15) If MaxEnt RL is the Answer, What is the Question?
Benjamin Eysenbach, Sergey Levine
[arXiv] [video]
(17) Online Markov Decision Processes with Max-Min Fairness
Wang Chi Cheung
(24) Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium
Qiaomin Xie, Yudong Chen, Zhaoran Wang, Zhuoran Yang
[arXiv]
(28) Power-Constrained Bandits
Jiayu Yao, Emma Brunskill, Weiwei Pan, Susan Murphy, Finale Doshi-Velez
[arXiv]
(35) Adaptive Reward-Free Exploration
Emilie Kaufmann, Pierre MENARD, Omar Darwiche Domingues, Anders Jonsson, Edouard Leurent, Michal Valko
[arXiv] [video]
(38) Near-Optimal Reinforcement Learning with Self-Play
Yu Bai, Chi Jin, Tiancheng Yu
[arXiv] [video]
(40) Reinforcement Learning with Feedback Graphs
Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan
[arXiv]
(41) Learning Implicit Credit Assignment for Multi-Agent Actor-Critic
Meng Zhou, Ziyu Liu, Pengwei Sui, Yixuan Li, Yuk Ying Chung
[arXiv] [video]
(42) Refined Analysis of FPL for Adversarial Markov Decision Processes
Yuanhao Wang, Kefan Dong
(47) A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces
Omar Darwiche Domingues, Pierre MENARD, Matteo Pirotta, Emilie Kaufmann, Michal Valko
[arXiv] [video]
(50) Provably Efficient Exploration for Reinforcement Learning with Unsupervised Learning
Fei Feng, Ruosong Wang, Wotao Yin, Simon Shaolei Du, lin Yang
[arXiv] [video]
(52) Multi-Armed Bandits with Correlated Arms
Samarth Gupta, Shreyas Chaudhari, Gauri Joshi, Osman Yagan
[arXiv] [video]
(54) TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning?
Joshua Romoff, Peter Henderson, David Kanaa, Emmanuel Bengio, Ahmed Touati, Pierre-Luc Bacon, Joelle Pineau
[arXiv]
(55) Sharp Analysis of Smoothed Bellman Error Embedding
Ahmed Touati, Pascal Vincent
[arXiv]
(60) Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition
Tiancheng Jin, Haipeng Luo
[arXiv]
(62) Adaptive Discretization for Model-Based Reinforcement Learning
Sean R. Sinclair, Tianyu Wang, Gauri Jain, Sid Banerjee, Christina Yu
[arXiv]
(64) Exploration-Exploitation in Constrained MDPs
Yonathan Efroni, Shie Mannor, Matteo Pirotta
[arXiv]
(66) Learning the Linear Quadratic Regulator from Nonlinear Observations
Zakaria Mhammedi, Dylan J Foster, Max Simchowitz, Dipendra Misra, Wen Sun, Akshay Krishnamurthy, Alexander Rakhlin, John Langford
(69) Finite-Sample Analysis of Stochastic Approximation Using Smooth Convex Envelopes
Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam
[arXiv] [video]
(81) Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability
David Simchi-Levi, Yunzong Xu
[arXiv]
(83) Set-Invariant Constrained Reinforcement Learning with a Meta-Optimizer
Chuangchuang Sun, Dong-Ki Kim, JONATHAN P HOW
[arXiv]
(85) Finding Equilibrium in Multi-Agent Games with Payoff Uncertainty
Wenshuo Guo, Mihaela Curmei, Serena Wang, Benjamin Recht
[arXiv]
(88) Reward-Free Exploration beyond Finite-Horizon
Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric
[paper] [video]
(96) Control as Hybrid Inference
Alexander Tschantz, Beren Millidge, Anil K Seth, Christopher Buckley
[arXiv]
(34) Distributional Robustness and Regularization in Reinforcement Learning
Esther Derman, Shie Mannor
[arXiv] [video]

Late Poster Session

(2) Minimax Confidence Interval for Off-Policy Evaluation and Policy Optimization
Nan Jiang, Jiawei Huang
[arXiv]
(4) An operator view of policy gradient methods
Dibya Ghosh, Marlos C. Machado, Nicolas Le Roux
[arXiv] [video]
(5) Crush Optimism with Pessimism: Structured Bandits Beyond Asymptotic Optimality
Kwang-Sung Jun, Chicheng Zhang
[arXiv]
(13) Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies
Nathan Kallus, Masatoshi Uehara
(16) Q-Learning Algorithm for Mean-Field Controls, with Convergence and Complexity Analysis
Haotian Gu, Xin Guo, Xiaoli Wei, Renyuan Xu
[arXiv] [video]
(21) PAC Imitation and Model-based Batch Learning of Contextual MDPs
Yash Nair, Finale Doshi-Velez
[arXiv]
(23) A Decentralized Policy Gradient Approach toMulti-task Reinforcement Learning
Sihan Zeng, Aqeel Anwar, Thinh T. Doan, Arijit Raychowdhury, Justin Romberg
[arXiv]
(25) A Finite-Time Analysis of Two Time-Scale Actor-Critic Methods
Yue Wu, Weitong ZHANG, Pan Xu, Quanquan Gu
(33) Provably More Efficient Q-Learning in the Full-Feedback/One-Sided-Feedback Settings
Xiao-Yue Gong, David Simchi-Levi
[arXiv]
(39) Sample-Efficient Reinforcement Learning of Undercomplete POMDPs
Chi Jin, Sham M. Kakade, Akshay Krishnamurthy, Qinghua Liu
[arXiv] [video]
(44) Bandit Linear Control
Asaf Benjamin Cassel, Tomer Koren
[arXiv]
(45) Robust Reinforcement Learning via Adversarial training with Langevin Dynamics
Parameswaran Kamalaruban, Yu-Ting Huang, Ya-Ping Hsieh, Paul Rolland, Cheng Shi, Volkan Cevher
[arXiv]
(51) The Mean-Squared Error of Double Q-Learning
Wentao Weng, Harsh Gupta, Niao He, Lei Ying, R. Srikant
(53) Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning
Nathan Kallus, Angela Zhou
[arXiv]
(56) Minimax Model Learning
Cameron Voloshin, Nan Jiang, Yisong Yue
(57) Preference learning along multiple criteria: A game-theoretic perspective
Kush Bhatia, Ashwin Pananjady, Peter Bartlett, Anca Dragan, Martin Wainwright
(58) Provably Good Batch Reinforcement Learning Without Great Exploration
Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill
(68) Black-Box Control for Linear Dynamical Systems
Xinyi Chen, Elad Hazan
(71) Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm
Sajad Khodadadian, Thinh T. Doan, Siva Theja Maguluri, Justin Romberg
(72) Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems
Sahin Lale, Kamyar Azizzadenesheli, Babak Hassibi, Anima Anandkumar
[arXiv]
(73) Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis
Koulik Khamaru, Ashwin Pananjady, Feng Ruan, Martin J. Wainwright, Michael Jordan
(75) Smoothness-Adaptive Contextual Bandits
Yonatan Gur, Ahmadreza Momeni, Stefan Wager
[arXiv]
(84) Geometric Exploration for Online Control
Orestis Plevrakis, Elad Hazan
[video]
(86) Conservative Q-Learning for Offline Reinforcement Learning
Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine
[ArXiv]
(89) Efficient MDP Analysis for Selfish-Mining in Blockchain
Ittay Eyal, Aviv Tamar
(95) Adaptive Regret for Online Control
Paula Gradu, Elad Hazan, Edgar Minasyan
(97) Generalized Chernoff Sampling: A New Perspective on Structured Bandit Algorithms
Subhojyoti Mukherjee, Ardhendu Tripathy, Robert D Nowak
[video]
(100) Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes
Yi Tian, Jian Qian, Suvrit Sra
[arXiv]
(102) Model Selection for Finite and Continuous-Armed Stochastic Contextual Bandits
Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran
[arXiv]

#ShutDownSTEM

Introduction

Schedule

Keynote Speakers

Shipra Agrawal

Sham Kakade

Akshay Krishnamurthy

Gergely Neu

Csaba Szepesvari

Martha White

Contributed Papers

Program Committee

Important Dates

Workshop Organizers

Emma Brunskill

Thodoris Lykouris

Max Simchowitz

Wen Sun

Mengdi Wang