Wen Sun

I'm an Assistant Professor in the Computer Science Department at Cornell University.

Prior to Cornell, I was a post-doc researcher at Microsoft Research NYC from 2019 to 2020. I completed my PhD at Robotics Institute, Carnegie Mellon University in June 2019, where I was advised by Drew Bagnell. I also worked closely with Byron Boots and Geoff Gordon. I've spent time at MSR NYC and Redmond, and Yahoo Research NYC as research intern.

CV  /  PhD Thesis  /  Google Scholar  /  Email  


I'm interested in machine learning, especially Reinforcement Learning. Much of my research is about designing algorithms for efficient sequential decision making, understanding exploration and exploitation, and how to leverage expert demonstrations to overcome exploration.


  • We are organizing a workshop at ICML 2020 on Theoretical Foundations of Reinforcement Learning. Please consider submitting your works!
  • Recent Talk (MSR Feb 2019)

    Information Theoretic Regret Bounds for Online Nonlinear Control
    (by alphabetic order) Sham Kakade, Akshay Krishnamurthy, Kendall Lowrey, Motoya Ohnishi´╝îWen Sun
    arXiv, 2020   [video]
    FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs
    (by alphabetic order) Alekh Agarwal, Sham Kakade, Akshay Krishnamurthy, Wen Sun
    arXiv, 2020  
    Constrained Episodic Reinforcement Learning in Concave-convex and Knapsack Settings
    (by alphabetic order) Kiante Brantley, Miroslav Dudik, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, Wen Sun
    arXiv, 2020  
    Corruption Robust Exploration in Episodic Reinforcement Learning
    (by alphabetic order) Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun
    arXiv, 2019  

    A general framework that enables (1) active action elimination in RL, and (2) enables provably robust exploration with adversarial corruptions on both rewards and transitions.

    Provably Efficient Model-based Policy Adaptation
    Yuda Song, Aditi Mavalankar, Wen Sun, Sicun Gao
    ICML, 2020   [video & code]
    Imitation Learning as f-Divergence Minimization
    Liyiming Ke, Sanjiban Choudhury, Matt Barnes, Wen Sun, Gilwoo Lee, Siddhartha Srinivasa
    WARF, 2020  
    Disagreement-Regularized Imitation Learning
    Kiante Brantley, Wen Sun, Mikael Henaff
    ICLR, 2020 (Spotlight)   [code]

    Using disagreement among an ensemble of pre-trained behavior cloning policies to reduce covariate shift in IL

    Policy Poisoning in Batch Reinforcement Learning and Control
    Yuzhe Ma, Xuezhou Zhang, Wen Sun, Jerry Zhu
    NeurIPS, 2019   [code] [poster]
    Optimal Sketching for Kronecker Product Regression and Low Rank Approximation
    (by alphabetic order) Huaian Diao, Rajesh Jayaram, Zhao Song, Wen Sun, David P. Woodruff
    NeurIPS, 2019  
    Provably Efficient Imitation Learning from Observations Alone
    Wen Sun, Anirudh Vemola, Byron Boots, J. Andrew Bagnell,
    ICML 2019 (Long Talk) [code] [slides]

    Frame IL with observations alone as a sequence of two-player minmax games.
    Polynomial sample complexity for learning near-optimal policy with general function approximation.

    Contextual Memory Tree
    Wen Sun, Alina Beygelzimer, Hal Daume III, John Langford, Paul Mineiro
    ICML, 2019 (Long Talk) [code] [slides]

    An incremental & learnable memory system maintained in a nearly balanced tree structure to ensure logarithmic time operations

    Model-based RL in CDPs: PAC bounds and Exponential Improvements over Model-free Approaches
    Wen Sun Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford
    COLT, 2019 [slides]

    A theoretical comparison between model-based RL and model-free RL.
    A sample efficient model-based RL algorithm.

    Contrasting Exploration in Parameter and Action Space: A Zeroth-Order Optimization Perspective
    Anirudh Vemula, Wen Sun, Drew Bagnell
    AISTATS 2019 [code] [poster]

    Exploration in action space can be much more efficient than zero-th order method when the number of policy parameters is way larger than the dimension of action space and planning horizon.

    Dual Policy Iteration
    Wen Sun, Geoff Gordon, Byron Boots, Drew Bagnell
    NeurIPS 2018 [code] [slides]

    Leverage Model-based control (i.e., iLQR) and reward-aware Imitation Learning (e.g., AggreVaTeD) to double boost policy improvement

    Truncated Horizon Policy Search: Combining Reinforcement Learning and Imitation Learning
    Wen Sun, Drew Bagnell, Byron Boots
    ICLR, 2018 [poster]

    Combination of IL & RL: use expert's value function as reward shaping to shorten planning horizon which in turn speeds up RL

    Sketching for Kronecker Product Regression and P-splines
    (by alphabetic order) Huaian Diao, Zhao Song, Wen Sun, David P. Woodruff
    AISTATS, 2018   (Oral)
    Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction
    Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell
    ICML, 2017   (also selected for oral presentation at RLDM 2017) [code] [slides]

    Can be viewed as an actor-critic algorithm with critic being expert's state-action Q function; exponential sample complexity seperation between IL and pure RL

    Safety-Aware Algorithms for Adversarial Contextual Bandit
    Wen Sun, Debadeepta Dey, Ashish Kapoor
    ICML, 2017 [slides]

    Minizing Regret while maintaining average risk below a pre-specified safety-threshold in long term

    Gradient Boosting on Stochastic Data Streams
    Hanzhang Hu, Wen Sun, Arun Venkatraman, Martial Hebert, Drew Bagnell
    AISTATS, 2017

    Learning to Filter with Predictive State Inference Machines
    Wen Sun, Arun Venkatraman, Byron Boots, J. Andrew Bagnell.
    ICML 2016 [slides]

    Learning to predict future recurrently.
    Can be viewed as a recurrent structure whose hidden state encodes information for accurately predicting future

    Online Bellman Residual Algorithms with Predictive Error Guarantees
    Wen Sun, Drew Bagnell
    UAI, 2015   Best Student Paper Award [slides]

    Adversarial online policy evaluation.
    A reduction from adversarial policy evaluation to general no-regret & stable online learning.

    Template from here