Section outline

    • Policy gradient methods II: NPG, Sample Based NPG, TRPO, exploration in policy gradients

    • Exercises on Value Iteration, Policy Iteration, Modified Policy Iteration and Q Learning