Prospective students, please read this before contacting me.
I'm interested in machine learning, especially Reinforcement Learning. Much of my research is about designing algorithms for efficient sequential decision making, understanding exploration and exploitation, and how to leverage expert demonstrations to overcome exploration.
Recent Talk (MSR Feb 2019)
Corruption Robust Exploration in Episodic Reinforcement Learning
(by alphabetic order) Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun
A general framework that enables (1) active action elimination in RL, and (2) enables provably robust exploration with adversarial corruptions on both rewards and
Policy Poisoning in Batch Reinforcement Learning and Control
Yuzhe Ma, Xuezhou Zhang, Wen Sun, Jerry Zhu
Optimal Sketching for Kronecker Product Regression and Low Rank Approximation
(by alphabetic order) Huaian Diao, Rajesh Jayaram, Zhao Song, Wen Sun, David P. Woodruff
Provably Efficient Imitation Learning from Observations Alone
J. Andrew Bagnell,
(Long Talk) [code]
Frame IL with observations alone as a sequence of two-player minmax games.
Polynomial sample complexity for learning near-optimal policy is achievable in this setting.
Contextual Memory Tree
Hal Daume III,
(Long Talk) [code]
A learnable memory system maintained in a nearly balanced tree structure to ensure logarithmic time operations
Model-based RL in CDPs: PAC bounds and Exponential Improvements over Model-free Approaches
A formal definition of Model-free RL.
A theoretical comparison between model-based RL and model-free RL. A general & efficient model-based RL algorithm.
Contrasting Exploration in Parameter and Action Space: A Zeroth-Order Optimization Perspective
Exploration in action space can be much more efficient than zero-th order method when the number of policy parameters is way larger than the dimension of action space and planning horizon.
Dual Policy Iteration
Leverage Model-based control (i.e., iLQR) and reward-aware Imitation Learning (e.g., AggreVaTeD) to double boost policy improvement
Truncated Horizon Policy Search: Combining Reinforcement Learning and Imitation Learning
Combination of IL & RL: use expert's value function as reward shaping to shorten planning horizon which in turn speeds up RL
Sketching for Kronecker Product Regression and P-splines
(by alphabetic order) Huaian Diao, Zhao Song, Wen Sun, David P. Woodruff
Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction
Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell
(also selected for oral presentation at RLDM 2017) [code]
Can be viewed as an actor-critic algorithm with critic being expert's state-action Q function; exponential sample complexity seperation between IL and pure RL
Safety-Aware Algorithms for Adversarial Contextual Bandit
Wen Sun, Debadeepta Dey, Ashish Kapoor
ICML, 2017 [slides]
Minizing Regret while maintaining average risk below a pre-specified safety-threshold in long term
Learning to Filter with Predictive State Inference Machines
Wen Sun, Arun Venkatraman, Byron Boots, J. Andrew Bagnell.
Learning to predict future recurrently.
Can be viewed as a recurrent structure whose hidden state encodes information for accurately predicting future
Online Bellman Residual Algorithms with Predictive Error Guarantees
Wen Sun, Drew Bagnell
Best Student Paper Award [slides]
Adversarial online policy evaluation. A reduction from adversarial policy evaluation to general no-regret & stable online learning.