Wen Sun

I'm an Assistant Professor in the Computer Science Department at Cornell University.

Prior to Cornell, I was a post-doc researcher at Microsoft Research NYC from 2019 to 2020. I completed my PhD at Robotics Institute, Carnegie Mellon University in June 2019, where I was advised by Drew Bagnell. I also worked closely with Byron Boots and Geoff Gordon. I've spent time at MSR NYC and Redmond, and Yahoo Research NYC as research intern.

CV  /  PhD Thesis  /  Google Scholar  /  Email  

Prospective students, please read this.

I'm interested in machine learning, especially Reinforcement Learning. Much of my research is about designing algorithms for efficient sequential decision making, understanding exploration and exploitation, and how to leverage expert demonstrations to overcome exploration.


Spring 2021: CS 4789/5789 Introduction to Reinforcement Learning

Fall 2020: CS 6789 Foundations of Reinforcement Learning

Recent Talks

Reinforcement Learning: Theory and Algorithms
Alekh Agarwal, Nan Jiang, Sham Kakade, Wen Sun

We are periodically making updates to the book draft. Content based on the courses taught by Nan at UIUC, the courses taught by Alekh and Sham at UW, and CS 6789 at Cornell.

Optimism is All You Need: Model-Based Imitation Learning From Observation Alone
Rahul Kidambi, Jonathan Chang, Wen Sun
arXiv, 2021  

IL from Observations is strictly harder than the classic IL; we incoporate exploration into the min-max IL framework (we balance exploration and imitation) to solve IL from observations near optimally in theory and efficiently in practice.

Fairness of Exposure in Stochastic Bandits
Lequn Wang, Yiwei Bai, Wen Sun, Thorsten Joachims
arXiv, 2021  
Robust Policy Gradient against Strong Data Corruption
Xuezhou Zhang, Yiding Chen, Xiaojin Zhu, Wen Sun
arXiv, 2021   [code]

The On-policy nature and the incremental update of PG allow it to be robust to strong adversarial corruption; The TRPO/NPG based implementation scales to high-dimension control tasks and is robust to strong data corruption.

Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency
Masatoshi Uehara, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun, Tengyang Xie
arXiv, 2021  
Corruption Robust Exploration in Episodic Reinforcement Learning
(by alphabetic order) Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun
arXiv, 2020  

A general framework that enables (1) active action elimination in RL, and (2) enables provably robust exploration with adversarial corruptions on both rewards and transitions.

PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning
(by alphabetic order) Alekh Agarwal, Mikael Henaff, Sham Kakade, Wen Sun
NeurIPS, 2020  

We study the advantages of on-policy policy gradient methods compared to off-policy methods such as Q-learning, and provide a new PG algorithm with exploration

Information Theoretic Regret Bounds for Online Nonlinear Control
(by alphabetic order) Sham Kakade, Akshay Krishnamurthy, Kendall Lowrey, Motoya Ohnishi, Wen Sun
NeurIPS, 2020   [video]

We study learning-to-control for nonlinear systems captured by RKHS or Gaussian Processes. While being more general, the regret bound is near optimal when specialized to LQRs

FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs
(by alphabetic order) Alekh Agarwal, Sham Kakade, Akshay Krishnamurthy, Wen Sun
NeurIPS, 2020 (Oral)  

Representation Learning in RL needs to be done jointly with exploration; we show how to do this correctly and riguously

Constrained Episodic Reinforcement Learning in Concave-convex and Knapsack Settings
(by alphabetic order) Kiante Brantley, Miroslav Dudik, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, Wen Sun
NeurIPS, 2020  

We study multi-objective RL and show how to do cautious exploration under various constraints

Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates
Wenhao Luo, Wen Sun, Ashish Kapoor
NeurIPS, 2020 (Spotlight)  
Provably Efficient Model-based Policy Adaptation
Yuda Song, Aditi Mavalankar, Wen Sun, Sicun Gao
ICML, 2020   [video & code]

We study Sim-to-Real under a model-based framework resulting an algorithm that enjoyes strong theoretical guarantees and excellent empirical performance

Imitation Learning as f-Divergence Minimization
Liyiming Ke, Sanjiban Choudhury, Matt Barnes, Wen Sun, Gilwoo Lee, Siddhartha Srinivasa
WARF, 2020  

We unify Imitation Learning by casting it as f-divergence minimization problem

Disagreement-Regularized Imitation Learning
Kiante Brantley, Wen Sun, Mikael Henaff
ICLR, 2020 (Spotlight)   [code]

Using disagreement among an ensemble of pre-trained behavior cloning policies to reduce covariate shift in IL

Policy Poisoning in Batch Reinforcement Learning and Control
Yuzhe Ma, Xuezhou Zhang, Wen Sun, Jerry Zhu
NeurIPS, 2019   [code] [poster]
Optimal Sketching for Kronecker Product Regression and Low Rank Approximation
(by alphabetic order) Huaian Diao, Rajesh Jayaram, Zhao Song, Wen Sun, David P. Woodruff
NeurIPS, 2019  
Provably Efficient Imitation Learning from Observations Alone
Wen Sun, Anirudh Vemola, Byron Boots, J. Andrew Bagnell,
ICML 2019 (Long Talk) [code] [slides]

Frame IL with observations alone as a sequence of two-player minmax games.
Polynomial sample complexity for learning near-optimal policy with general function approximation.

Contextual Memory Tree
Wen Sun, Alina Beygelzimer, Hal Daume III, John Langford, Paul Mineiro
ICML, 2019 (Long Talk) [code] [slides]

An incremental & learnable memory system maintained in a nearly balanced tree structure to ensure logarithmic time operations

Model-based RL in CDPs: PAC bounds and Exponential Improvements over Model-free Approaches
Wen Sun Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford
COLT, 2019 [slides]

A theoretical comparison between model-based RL and model-free RL.
A sample efficient model-based RL algorithm.

Contrasting Exploration in Parameter and Action Space: A Zeroth-Order Optimization Perspective
Anirudh Vemula, Wen Sun, Drew Bagnell
AISTATS 2019 [code] [poster]

Exploration in action space can be much more efficient than zero-th order method when the number of policy parameters is way larger than the dimension of action space and planning horizon.

Dual Policy Iteration
Wen Sun, Geoff Gordon, Byron Boots, Drew Bagnell
NeurIPS 2018 [code] [slides]

Leverage Model-based control (i.e., iLQR) and reward-aware Imitation Learning (e.g., AggreVaTeD) to double boost policy improvement

Truncated Horizon Policy Search: Combining Reinforcement Learning and Imitation Learning
Wen Sun, Drew Bagnell, Byron Boots
ICLR, 2018 [poster]

Combination of IL & RL: use expert's value function as reward shaping to shorten planning horizon which in turn speeds up RL

Sketching for Kronecker Product Regression and P-splines
(by alphabetic order) Huaian Diao, Zhao Song, Wen Sun, David P. Woodruff
AISTATS, 2018   (Oral)
Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction
Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell
ICML, 2017   (also selected for oral presentation at RLDM 2017) [code] [slides]

Can be viewed as an actor-critic algorithm with critic being expert's state-action Q function; exponential sample complexity seperation between IL and pure RL

Safety-Aware Algorithms for Adversarial Contextual Bandit
Wen Sun, Debadeepta Dey, Ashish Kapoor
ICML, 2017 [slides]

Minizing Regret while maintaining average risk below a pre-specified safety-threshold in long term

Gradient Boosting on Stochastic Data Streams
Hanzhang Hu, Wen Sun, Arun Venkatraman, Martial Hebert, Drew Bagnell

Learning to Filter with Predictive State Inference Machines
Wen Sun, Arun Venkatraman, Byron Boots, J. Andrew Bagnell.
ICML 2016 [slides]

Learning to predict future recurrently.
Can be viewed as a recurrent structure whose hidden state encodes information for accurately predicting future

Online Bellman Residual Algorithms with Predictive Error Guarantees
Wen Sun, Drew Bagnell
UAI, 2015   Best Student Paper Award [slides]

Adversarial online policy evaluation.
A reduction from adversarial policy evaluation to general no-regret & stable online learning.

Template from here