CS-456: Inconsistent A2C convergence to optimal policy with 1 worker and no batches

Hello,

Indeed, in deep RL performance can be highly dependent on the hyperparameters used. In the project, we give you a complete fixed set of optimization hyperparameters (learning rate, optimizer, etc.) that we have tested with multiple seeds to obtain the success criteria.
The runs on the slide are likely performed with other hyperparameters or implementation details (which for the sake of that lecture may not have been recorded or optimized).

Best,

The teaching team.

ANN Forum

Inconsistent A2C convergence to optimal policy with 1 worker and no batches

Re: Inconsistent A2C convergence to optimal policy with 1 worker and no batches