CS 4789/5789 Intro to RL

CS 4789/5789: Introduction to Reinforcement Learning

Modern Artificial Intelligent (AI) systems often need the ability to make sequential decisions in an unknown, uncertain, possibly hostile environment, by actively interacting with the environment to collect relevant data. Reinforcement Learning (RL) is a general framework that can capture the interactive learning setting and has been used to optimize generative AI models and design intelligent agents that achieve super-human level performances on challenging tasks such as board games, computer games, and robotics.

This course focuses on basics of Reinforcement Learning. The four main parts of the course are (1) basics of Markov Decision Process, (2) Planning and Control in MDP, (3) Learning in MDPs, and (4) RL from human feedback.

After taking this course, students will be able to understand classic RL algorithms, their analysis, and their usage in modern AI applications.

All lectures will be math heavy. We will go through algorithms and their analysis.

Staff

Instructor: Wen Sun

Lecture time: Monday/Wednesday 2:55pm - 4:10pm ET

Instructor office hours: Thursday 3-4 pm Gates 416b

TAs office hours: see ED discussion

Contact: ED Discussion

Please communicate to the instructors and TA through ED. Emails sent to instructor and TAs, with regards to the course, will not be responded to in a timely manner.

Recorded Lectures

The course does not provide recorded videos. You can find recorded lectures from Spring 2021 here, but they are out of dated.

Prerequisites

Since lectures are math heavy and we will focus on algorithm design and analysis, we require students to have strong Machine Learning background (e.g., CS 4780). Students should be comfortable about basics of probability and linear algebra.

Since HWs contain programming problems, we expect students are comfortable about programming. We will use Python as the programming language in ALL HWs.

Grading Policies

CS4789: Exams: 50%; Homework: 10%; Programming assingments: 40%

CS5789: Exams: 45%; Homework: 10%; Programming assingments: 35%; paper comprehension: 10%

Undergraduates enrolled in 4780 may choose to do the paper comprehension assignments; if completed you will receive the higher of your two grades between the above schemes.

Homework: There will be a number of homework assignments throughout the course, typically made available roughly one to two weeks before the due date. The homework primarily focuses on theoretical aspects of the material and is intended to provide preparation for the exams. Homework may be completed in groups of up to two. You are allowed two slip days per homework.

Programming: To provide hands on learning with the methods we will discuss in class there are a number of programming projects throughout the course. The projects may be completed solo or in a group of two. You are allowed two slip days per project.

Paper comprehension: Students enrolled in this course at the graduate level (i.e., enrolled in 5789) are required to read assigned research papers and complete the associated online quiz. Papers will be assigned roughly once every two to three weeks. You are allowed two slip days per quiz.

Exams: There will be two exams for this class, an evening prelim and a final exam.

Prelim:

Final:

For both prelim and final, we will usually arrange one (and only one) makeup exam during that week. You are expected to be available during the prelim and final weeks (e.g., do not plan travel during that two weeks). No online exam will be given. If you miss the prelim, you can use the final to cover the prelim. If you miss the final, you can take an incomplete and take the exam next semester.

Diversity and Inclusiveness

While many academic disciplines have historically been dominated by one cross section of society, the study of and participation in STEM disciplines is a joy that the instructors hope that everyone can pursue, regardless of their socio-economic background, race, gender, etc. We encourage students to both be mindful of these issues, and, in good faith, try to take steps to fix them. You are the next generation here.

You should expect and demand to be treated by your classmates and the course staff with respect. You belong here, and we are here to help you learn and enjoy this course. If any incident occurs that challenges this commitment to a supportive and inclusive environment, please let the instructors know so that the issue can be addressed. We are personally committed to this, and subscribe to the Computer Science Department Values of Inclusion.

Honor Code

Collaborations only where explicitly allowed

Do not use of forums like Course Hero, Chegg;

Whatever materials you use for your HWs including generative AI (e.g., GPT), properly cite the references; If you are unclear about whether some online material can be used, ask instructors and TAs first

No sharing of your solutions within or outside class at any time

We will be extremely serious about academic integrity. The above is not an exhaustive list, and in general any Cornell and common sense rules about academic integrity apply. If it is not something we explicitly allowed, ask us whether it is OK before you do it.

Cornell University Code of Academic Integrity, CS Department Code of Academic Integrity.

Course Notes

The course will sometimes use working draft of the book "Reinforcement Learning Theory and Algorithms", available here.

Note that this is an extremely advanced RL theory book. A lot of the materials in the book are out of the scope of this class. Thus we will pick very specific sections for you to read.

If you find typos or errors, please let us know. We would appreciate it!

You can also self-study the classic book "Reinforcement Learning: An Introduction", available here

Schedule (tentative)

	Lecture	Reading	Slides/HW
01/22/25	Introduction	AJKS: 1.1.1	Slides
01/27/25	Fundamentals: Markov Decision Processes	AJKS: 1.1.1	Slides, Annotated Slides
01/29/25	Fundamentals: Policy Evaluation	AJKS: 1.1.2	Slides, Annotated Slides
02/3/25	Fundamentals: Value Iteration	AJKS: 1.3.1	Slides, Annotated Slides
02/5/25	Fundamentals: Policy Iteration	AJKS: 1.3.2	Slides, Annotated Slides
02/10/25	Value-based Learning: Temporal Difference Learning	RL intro: 6.1	Slides, Annotated Slides
02/12/25	Value-based Learning: Q Learning	RL intro: 6.5	Slides, Annotated Slides
02/19/25	Model-based Learning: Tabular Model-based RL		Slides, Annotated Slides
02/24/25	ML recap: Supervised Learning, Pytorch, and Gym		Slides, Slides (Pytorch & Gym)
02/26/25	Value-based Learning: Deep Q Network (DQN)	Paper	Slides, Annotated Slides
03/03/25	Policy Optimization: REINFORCE	AJKS: 11.1	Slides, Annotated Slides
03/05/25	Policy Optimization: Variance reduction		Slides, Annotated Slides
03/10/25	Policy Optimization: Natural Policy Gradient		slides, Annotated Slides
03/12/25	Policy Optimization: NPG and Proximal Policy Optimization		Slides, Annotated Slides
03/17/25	Class cancelled: Study HW3, wrap up PA2, and prepare for prelim
03/19/25	Midterm office hour (Gates 416B)
03/24/25	Policy Optimization: PPO and Generalized Advantage Estimation		Slides, Annotated Slides
03/26/25	Policy Optimization: Use offline data in RL		Slides, Annotated Slides
04/07/25	RL for LLM: Reward modeling from human feedback		Slides, Annotated Slides
04/09/25	RL for LLM: Direct Preference Optimization	Paper	Slides, Annotated Slides
04/14/25	RL for LLM: Regressing relative rewards	Paper on REBEL	Slides, Annotated Slides
04/16/25	RL for LLM: Controllable Generation	Paper on Q#	Slides, Annotated Slides
04/21/25	Imitation Learning :Behavior Cloning	PDL	Slides, Annotated Slides
04/28/25	Imitation Learning :DAgger	Dagger paper	Slides, Annotated Slides
04/30/25	Imitation Learning :DAgger and Inverse RL		Slides, Annotated Slides
05/05/25	Imitation and Case Study :Inverse RL and AlphaGo		Slides, Annotated Slides