Wen Sun

I'm a post-doc researcher at Microsoft Research NYC. I will join the Computer Science Department at Cornell University as an assistant professor in Fall 2020.

I completed my PhD at Robotics Institute, Carnegie Mellon University in June 2019, where I was advised by Drew Bagnell. I also work closely with Byron Boots and Geoff Gordon. I've spent time at MSR NYC and Redmond, and Yahoo Research NYC.

CV  /  PhD Thesis  /  Google Scholar  /  Email  


I'm interested in machine learning, especially Reinforcement Learning. Much of my research is about designing algorithms for efficient sequential decision making, understanding exploration and exploitation, and how to leverage expert demonstrations to overcome exploration.

Recent Talk (MSR Feb 2019)

Corruption Robust Exploration in Episodic Reinforcement Learning
(by alphabetic order) Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun
arXiv, 2019  

A general framework that enables (1) active action elimination in RL, and (2) enables provably robust exploration with adversarial corruptions on both rewards and transitions.

Disagreement-Regularized Imitation Learning
Kiante Brantley, Wen Sun, Mikael Henaff
ICLR, 2020 (Spotlight)  

Using disagreement among an ensemble of pre-trained behavior cloning policies to reduce covariate shift in IL

Policy Poisoning in Batch Reinforcement Learning and Control
Yuzhe Ma, Xuezhou Zhang, Wen Sun, Jerry Zhu
NeurIPS, 2019   [code] [poster]
Optimal Sketching for Kronecker Product Regression and Low Rank Approximation
(by alphabetic order) Huaian Diao, Rajesh Jayaram, Zhao Song, Wen Sun, David P. Woodruff
NeurIPS, 2019  
Provably Efficient Imitation Learning from Observations Alone
Wen Sun, Anirudh Vemola, Byron Boots, J. Andrew Bagnell,
ICML 2019 (Long Talk) [code] [slides]

Frame IL with observations alone as a sequence of two-player minmax games.
Polynomial sample complexity for learning near-optimal policy with general function approximation.

Contextual Memory Tree
Wen Sun, Alina Beygelzimer, Hal Daume III, John Langford, Paul Mineiro
ICML, 2019 (Long Talk) [code] [slides]

An incremental & learnable memory system maintained in a nearly balanced tree structure to ensure logarithmic time operations

Model-based RL in CDPs: PAC bounds and Exponential Improvements over Model-free Approaches
Wen Sun Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford
COLT, 2019 [slides]

A theoretical comparison between model-based RL and model-free RL.
A sample efficient model-based RL algorithm.

Contrasting Exploration in Parameter and Action Space: A Zeroth-Order Optimization Perspective
Anirudh Vemula, Wen Sun, Drew Bagnell
AISTATS 2019 [code] [poster]

Exploration in action space can be much more efficient than zero-th order method when the number of policy parameters is way larger than the dimension of action space and planning horizon.

Dual Policy Iteration
Wen Sun, Geoff Gordon, Byron Boots, Drew Bagnell
NeurIPS 2018 [code] [slides]

Leverage Model-based control (i.e., iLQR) and reward-aware Imitation Learning (e.g., AggreVaTeD) to double boost policy improvement

Truncated Horizon Policy Search: Combining Reinforcement Learning and Imitation Learning
Wen Sun, Drew Bagnell, Byron Boots
ICLR, 2018 [poster]

Combination of IL & RL: use expert's value function as reward shaping to shorten planning horizon which in turn speeds up RL

Sketching for Kronecker Product Regression and P-splines
(by alphabetic order) Huaian Diao, Zhao Song, Wen Sun, David P. Woodruff
AISTATS, 2018   (Oral)
Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction
Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell
ICML, 2017   (also selected for oral presentation at RLDM 2017) [code] [slides]

Can be viewed as an actor-critic algorithm with critic being expert's state-action Q function; exponential sample complexity seperation between IL and pure RL

Safety-Aware Algorithms for Adversarial Contextual Bandit
Wen Sun, Debadeepta Dey, Ashish Kapoor
ICML, 2017 [slides]

Minizing Regret while maintaining average risk below a pre-specified safety-threshold in long term

Learning to Filter with Predictive State Inference Machines
Wen Sun, Arun Venkatraman, Byron Boots, J. Andrew Bagnell.
ICML 2016 [slides]

Learning to predict future recurrently.
Can be viewed as a recurrent structure whose hidden state encodes information for accurately predicting future

Online Bellman Residual Algorithms with Predictive Error Guarantees
Wen Sun, Drew Bagnell
UAI, 2015   Best Student Paper Award [slides]

Adversarial online policy evaluation.
A reduction from adversarial policy evaluation to general no-regret & stable online learning.

Template from here