CS 6789: Foundations of Reinforcement Learning
Modern Artificial Intelligent (AI) systems often need the ability to make sequential decisions in an unknown,
uncertain, possibly hostile environment, by actively interacting with the environment to collect relevant data.
Reinforcement Learning (RL) is a general framework that can capture the interactive learning setting and
has been used to design intelligent agents that achieve superhuman level performances on
challenging tasks such as Go, computer games, and robotics manipulation.
This graduate level course focuses on theoretical and algorithmic foundations of Reinforcement Learning. The four main themes of the course are
(1) provably efficient exploration, (2) policy optimization (especially policy gradient), (3) control, and (4) imitation learning.
After taking this course, students will be able to understand both classic and stateofart provably correct RL algorithms and their analysis. Students will be able to conduct research on RL related topics.

Staff
Instructors: Wen Sun (Cornell) and Sham Kakade (University of Washington)
TAs: Jonathan Chang
Lecture time: Tuesday/Thursday 34:15pm ET
Office hours: By Appointment
Contact: cornellcs6789@gmail.com.
Please communicate to the instructors and TA only through this account.
Emails not sent to this list, with regards to the course,
will not be responded to in a timely manner.

Zoom Information
Zoom information has been posted on Piazza. If you are not enrolled/wait listed (or you are not from Cornell), but want to have access,
please email cornellcs6789@gmail.com to ask for permission. We will make a decision based on the capacity of the class
and your research background (please in email briefly describe your research interestes and your background on machine learning theory. Thanks).

Prerequisites
This is an advanced and theoryheavy course: there is no programming assignment and students
are required to work on a theoryfocused course project.
Students need strong grasp on Machine Learning (e.g., CS 4780), Probability and Statistics (e.g., BTRY 3080 or ECON 3130, or MATH 4710), Optimization (e.g., ORIE 3300), and Linear Algebra (e.g., MATH 2940).
For undergraduate students enrollment: permission of instructor with minimum grade A in CS 4780.

Grading Policies
Assignments 55% (HW0:10%, HW1HW3: 15% each) and Project 45%
All homework will be mathematical in nature, focussing on the theory of RL and bandits;
there will not be a programming component.
The entire HW must be submitted in one single typed pdf document (not handwritten).
HW0 is MANDATORY to pass to satisfactory level;
it is to check your knowledge of the prerequisites in probability, statistics, and linear algebra.
Homework Rules:
Homework must be done individually: each student must understand, write, and hand in their own answers. It is
acceptable for students to discuss problems with each other;
it is not acceptable for students to look at another students written answers.
You must also indicate on each homework with whom you collaborated with and what online resources you used.
Late days: Homeworks must be submitted by the posted due date.
You are allowed up to 5 total LATE DAYs for the homeworks throughout the entire semester. These will be automatically deducted if your assignment is late.
For example, any day in which an assignment is late by up to 24 hours,
then one late day will be used (up to two late days). After your late days are used up,
late penalties will be applied: any assignment turned in late will incur a reduction in score by 33% for each late day,
so if an assignment is up to 24 hours late, it incurs a penalty of 33%.
Else if it is up to 48 hours late, it incurs a penalty of 66%.
And any longer, it will receive no credit. We will track all your late days and any deductions will be applied in computing the final grades.
If you are unable to turn in HWs on time, aside from permitted days, then do not enroll in the course.

Course Project
Please see the course project page.

Diversity in STEM
While many academic disciplines have historically been dominated by one cross section of society,
the study of and participation in STEM disciplines is a joy that the instructors hope that everyone can pursue,
regardless of their socioeconomic background, race, gender, etc.
The instructors encourage students to both be mindful of these issues, and,
in good faith, try to take steps to fix them. You are the next generation here.

Course Notes: RL Theory and Algorithms
The course will be largely based of the working draft of
the book "Reinforcement Learning Theory and
Algorithms", available
here.
We will be updating these notes in V2
through th course of the term. If you find typos or errors, please let us
know. We would appreciate it!

Schedule (tentative)


Lecture 
Reading 
Slides/HW 
09/3/20 

Fundamentals: Markov Decision Processes 
Ch.1 
Slides, Annotated slides, HW0 
09/08/20 

Fundamentals: Policy Iteration and Value Iteration 
Ch.1 
Slides, Annotated slides 
09/10/20 

Fundamentals: Computational Complexity & The LPFormulation 
Ch.1 
Slides, Annotated slides 
09/15/20 

Fundamentals: Statistical Limits 
Ch.2 
Slides, Annotated slides 
09/17/20 

Exploration: MultiArmed Bandit 
Ch.1

Slides, Annotated slides , Guest Lecturer: Thodoris
Lykouris, HW1 
09/22/20 

Exploration: Efficient Exploration in Tabular MDPs 
Ch.6 
Slides, Annotated slides 
09/24/20 

Fundamentals: Generalization in RL 
Ch.4 
Slides, Annotated slides 
09/29/20 

Exploration: Linear Bandits 
Ch.5 ,
Paper 
Slides, Annotated slides 
10/1/20 

Exploration: Efficient Exploration in Linear MDPs 
Ch.7 , Paper 
Slides, Annotated slides 
10/6/20 

Exploration: Efficient Exploration in Linear MDPs (continued) 
Ch.7 
Slides, Annotated slides 
10/8/20 

Exploration: Learning in Large Scale MDPs (Bellman Rank/Witness Rank) 
Ch.8,
Bellman Rank, Witness Rank, 
Slides, Annotated slides 
10/13/20 

Policy Optimization: Policy Gradient (REINFORCE, Variance Reduction, Convergence) 
Ch.9 
Slides, Annotated Slides, HW2 
10/15/20 

Policy Optimization: Global Convergence? 
Ch.10 
Slides, Annotated Slides 
10/20/20 

Policy Optimization: Natural Policy Gradient (NPG) and its Global Convergence 
Ch.10
Optional: Experts+MDPs

Slides, Annotated Slides 
10/22/20 

Policy Optimization: NPG and
Function Approximation 
Ch.11 
Slides, Annotated Slides 
10/27/20 

Policy Optimization: Trust Region Methods 
Ch.12 , Covariant
Policy Search, TRPO

Slides, Annotated Slides, HW2 Due (Oct 30) 
10/29/20 

Policy Optimization: Conservative Policy Iteration 
Ch.3 + Ch.12, CPI 
Slides, Annotated Slides 
11/3/20 

Control: Basics of LQR (Ricatti Equations) 
Ch.13 
Slides, Annotated Slides 
11/5/20 

Control: SDP formulation, GaussNewton/Policy Iteration, and a convex parameterization (System Level Synthesis) 
Ch.13 
Slides, Annotated SDP Slides, Slides SLS, Annotated Slides SLS 
11/10/20 

Batch RL: Fitted Q Iteration and Recent Advances 
Note,
Ch.15 
Annotated Slides, Guest Lecturer: Akshay Krishnamurthy 
11/12/20 

Imitation Learning: Behavior Cloning, Distribution Shift, and Distribution Matching 
Ch.14 
Slides, Annotated Slides, HW3 (Due Nov 24 11:59pm) 
11/17/20 

No Class (semifinal week) 


11/19/20 

No Class (semifinal week) 


11/24/20 

No Class (semifinal week) 


12/01/20 

Imitation Learning: Maximum Entropy Inverse RL 
Ch.14 
Slides, Annotated Slides 
12/03/20 

Imitation Learning: Interactive Learning (DAgger) 
DAgger 
Slides 
12/08/20 

Student Project Presentations 


12/10/20 

Student Project Presentations 


12/15/20 

No Class 



