I'm interested in machine learning, especially Reinforcement Learning. Much of my research is about designing algorithms for efficient sequential decision making, understanding exploration and exploitation, and how to leverage expert demonstrations to overcome exploration.
Recent Talk (MSR Feb 2019)
Information Theoretic Regret Bounds for Online Nonlinear Control
(by alphabetic order) Sham Kakade, Akshay Krishnamurthy, Kendall Lowrey, Motoya Ohnishi，Wen Sun
FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs
(by alphabetic order) Alekh Agarwal, Sham Kakade, Akshay Krishnamurthy, Wen Sun
Constrained Episodic Reinforcement Learning in Concave-convex and Knapsack Settings
(by alphabetic order) Kiante Brantley, Miroslav Dudik, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, Wen Sun
Corruption Robust Exploration in Episodic Reinforcement Learning
(by alphabetic order) Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun
A general framework that enables (1) active action elimination in RL, and (2) enables provably robust exploration with adversarial corruptions on both rewards and
Provably Efficient Model-based Policy Adaptation
Yuda Song, Aditi Mavalankar, Wen Sun, Sicun Gao
[video & code]
Imitation Learning as f-Divergence Minimization
Liyiming Ke, Sanjiban Choudhury, Matt Barnes, Wen Sun, Gilwoo Lee, Siddhartha Srinivasa
Disagreement-Regularized Imitation Learning
Kiante Brantley, Wen Sun, Mikael Henaff
Using disagreement among an ensemble of pre-trained behavior cloning policies to reduce covariate shift in IL
Policy Poisoning in Batch Reinforcement Learning and Control
Yuzhe Ma, Xuezhou Zhang, Wen Sun, Jerry Zhu
Optimal Sketching for Kronecker Product Regression and Low Rank Approximation
(by alphabetic order) Huaian Diao, Rajesh Jayaram, Zhao Song, Wen Sun, David P. Woodruff
Provably Efficient Imitation Learning from Observations Alone
J. Andrew Bagnell,
(Long Talk) [code]
Frame IL with observations alone as a sequence of two-player minmax games.
Polynomial sample complexity for learning near-optimal policy with general function approximation.
Contextual Memory Tree
Hal Daume III,
(Long Talk) [code]
An incremental & learnable memory system maintained in a nearly balanced tree structure to ensure logarithmic time operations
Model-based RL in CDPs: PAC bounds and Exponential Improvements over Model-free Approaches
A theoretical comparison between model-based RL and model-free RL.
A sample efficient model-based RL algorithm.
Contrasting Exploration in Parameter and Action Space: A Zeroth-Order Optimization Perspective
Exploration in action space can be much more efficient than zero-th order method when the number of policy parameters is way larger than the dimension of action space and planning horizon.
Dual Policy Iteration
Leverage Model-based control (i.e., iLQR) and reward-aware Imitation Learning (e.g., AggreVaTeD) to double boost policy improvement
Truncated Horizon Policy Search: Combining Reinforcement Learning and Imitation Learning
Combination of IL & RL: use expert's value function as reward shaping to shorten planning horizon which in turn speeds up RL
Sketching for Kronecker Product Regression and P-splines
(by alphabetic order) Huaian Diao, Zhao Song, Wen Sun, David P. Woodruff
Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction
Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell
(also selected for oral presentation at RLDM 2017) [code]
Can be viewed as an actor-critic algorithm with critic being expert's state-action Q function; exponential sample complexity seperation between IL and pure RL
Safety-Aware Algorithms for Adversarial Contextual Bandit
Wen Sun, Debadeepta Dey, Ashish Kapoor
ICML, 2017 [slides]
Minizing Regret while maintaining average risk below a pre-specified safety-threshold in long term
Gradient Boosting on Stochastic Data Streams
Hanzhang Hu, Wen Sun, Arun Venkatraman, Martial Hebert, Drew Bagnell
Learning to Filter with Predictive State Inference Machines
Wen Sun, Arun Venkatraman, Byron Boots, J. Andrew Bagnell.
Learning to predict future recurrently.
Can be viewed as a recurrent structure whose hidden state encodes information for accurately predicting future
Online Bellman Residual Algorithms with Predictive Error Guarantees
Wen Sun, Drew Bagnell
Best Student Paper Award [slides]
Adversarial online policy evaluation.
A reduction from adversarial policy evaluation to general no-regret & stable online learning.