CS 4789/5789: Introduction to Reinforcement Learning


Modern Artificial Intelligent (AI) systems often need the ability to make sequential decisions in an unknown, uncertain, possibly hostile environment, by actively interacting with the environment to collect relevant data. Reinforcement Learning (RL) is a general framework that can capture the interactive learning setting and has been used to optimize generative AI models and design intelligent agents that achieve super-human level performances on challenging tasks such as board games, computer games, and robotics.

This course focuses on basics of Reinforcement Learning. The four main parts of the course are (1) basics of Markov Decision Process, (2) Planning and Control in MDP, (3) Learning in MDPs, and (4) RL from human feedback.

After taking this course, students will be able to understand classic RL algorithms, their analysis, and their usage in modern AI applications.

All lectures will be math heavy. We will go through algorithms and their analysis.

Staff


Instructor: Wen Sun

Lecture time: Monday/Wednesday 2:55pm - 4:10pm ET

Instructor office hours: Thursday 3-4 pm Gates 416b

TAs office hours: see ED discussion

Contact: ED Discussion

Please communicate to the instructors and TA through ED. Emails sent to instructor and TAs, with regards to the course, will not be responded to in a timely manner.


Recorded Lectures

The course does not provide recorded videos. You can find recorded lectures from Spring 2021 here, but they are out of dated.

Prerequisites


Since lectures are math heavy and we will focus on algorithm design and analysis, we require students to have strong Machine Learning background (e.g., CS 4780). Students should be comfortable about basics of probability and linear algebra.

Since HWs contain programming problems, we expect students are comfortable about programming. We will use Python as the programming language in ALL HWs.

Grading Policies


CS4789: Exams: 50%; Homework: 10%; Programming assingments: 40%

CS5789: Exams: 45%; Homework: 10%; Programming assingments: 35%; paper comprehension: 10%

Undergraduates enrolled in 4780 may choose to do the paper comprehension assignments; if completed you will receive the higher of your two grades between the above schemes.

Homework: There will be a number of homework assignments throughout the course, typically made available roughly one to two weeks before the due date. The homework primarily focuses on theoretical aspects of the material and is intended to provide preparation for the exams. Homework may be completed in groups of up to two. You are allowed two slip days per homework.

Programming: To provide hands on learning with the methods we will discuss in class there are a number of programming projects throughout the course. The projects may be completed solo or in a group of two. You are allowed two slip days per project.

Paper comprehension: Students enrolled in this course at the graduate level (i.e., enrolled in 5789) are required to read assigned research papers and complete the associated online quiz. Papers will be assigned roughly once every two to three weeks. You are allowed two slip days per quiz.

Exams: There will be two exams for this class, an evening prelim and a final exam.

  • Prelim:
  • Final:
  • For both prelim and final, we will usually arrange one (and only one) makeup exam during that week. You are expected to be available during the prelim and final weeks (e.g., do not plan travel during that two weeks). No online exam will be given. If you miss the prelim, you can use the final to cover the prelim. If you miss the final, you can take an incomplete and take the exam next semester.



    Diversity and Inclusiveness

    While many academic disciplines have historically been dominated by one cross section of society, the study of and participation in STEM disciplines is a joy that the instructors hope that everyone can pursue, regardless of their socio-economic background, race, gender, etc. We encourage students to both be mindful of these issues, and, in good faith, try to take steps to fix them. You are the next generation here.

    You should expect and demand to be treated by your classmates and the course staff with respect. You belong here, and we are here to help you learn and enjoy this course. If any incident occurs that challenges this commitment to a supportive and inclusive environment, please let the instructors know so that the issue can be addressed. We are personally committed to this, and subscribe to the Computer Science Department Values of Inclusion.

    Honor Code

  • Collaborations only where explicitly allowed
  • Do not use of forums like Course Hero, Chegg;
  • Whatever materials you use for your HWs including generative AI (e.g., GPT), properly cite the references; If you are unclear about whether some online material can be used, ask instructors and TAs first
  • No sharing of your solutions within or outside class at any time
  • We will be extremely serious about academic integrity. The above is not an exhaustive list, and in general any Cornell and common sense rules about academic integrity apply. If it is not something we explicitly allowed, ask us whether it is OK before you do it.

    Cornell University Code of Academic Integrity, CS Department Code of Academic Integrity.

    Course Notes

    The course will sometimes use working draft of the book "Reinforcement Learning Theory and Algorithms", available here.

    Note that this is an extremely advanced RL theory book. A lot of the materials in the book are out of the scope of this class. Thus we will pick very specific sections for you to read.

    If you find typos or errors, please let us know. We would appreciate it!

    You can also self-study the classic book "Reinforcement Learning: An Introduction", available here

    Schedule (tentative)

    Lecture Reading Slides/HW
    01/22/25 Introduction AJKS: 1.1.1 Slides
    01/27/25 Fundamentals: Markov Decision Processes AJKS: 1.1.1 Slides, Annotated Slides
    01/29/25 Fundamentals: Policy Evaluation AJKS: 1.1.2 Slides, Annotated Slides
    02/3/25 Fundamentals: Value Iteration AJKS: 1.3.1 Slides, Annotated Slides
    02/5/25 Fundamentals: Policy Iteration AJKS: 1.3.2 Slides, Annotated Slides
    02/10/25 Value-based Learning: Temporal Difference Learning RL intro: 6.1 Slides, Annotated Slides
    02/12/25 Value-based Learning: Q Learning RL intro: 6.5 Slides, Annotated Slides
    02/19/25 Model-based Learning: Tabular Model-based RL Slides, Annotated Slides
    02/24/25 ML recap: Supervised Learning, Pytorch, and Gym
    02/26/25 Value-based Learning: Deep Q Network (DQN) Paper
    03/03/25 Policy Optimization: REINFORCE
    03/05/25 Policy Optimization: Natural Policy Gradient
    03/10/25 Policy Optimization: REBEL
    03/12/25 Policy Optimization: Variance reduction and Actor-Critic
    03/17/25 Policy Optimization: Proximal Policy Optimization (PPO)
    03/19/25 Midterm office hour
    03/24/25 Policy Optimization: TBD
    03/26/25 Policy Optimization: TBD
    04/07/25 RL for LLM: Direct Preference Optimization (DPO)
    04/09/25 RL for LLM: Reward modeling and online RL
    04/14/25 RL for LLM: Q#: solving KL-regularized RL
    04/16/25 Search: MCTS
    04/21/25 Case study:AlphaGo