Lecture 2: Dynamic Programming.
MDPs; value and Q functions; value iteration, policy iteration; operator perspectives. Model-free policy-based and value-based methods; Monte Carlo (MC) method and temporal difference (TD) learning.
Click lecture 2 (2023).pdf link to view the file.