CS-456: MP1 - RND Normalization

Hello,

I have a question about the normalizations in part 3.4.

The project description states that the states have to be normalized using only a running average and that the intrinsic reward has to be also normalized but this time clamped between -5 and 5.

However, the linked article in the project description [Burda et al., 2018]. seems to state the opposite, namely that the intrinsic reward has to be only normalized with a running average and that it is the next_state that has to be normalized and clamped between -5 and 5.

Which one should we implement ?

Thanks for your help,

Gaston Wolfart

Re: MP1 - RND Normalization

by Lucas Louis Gruaz - Monday, 27 May 2024, 09:28

Hello,

Yes, you are right. This is a mistake we made, but both work fine for the given task. You can implement any of the two options, both will be considered correct.

Best,
Lucas

Re: MP1 - RND Normalization

by Maria Yuffa Meshcheryakova - Monday, 27 May 2024, 13:59

Dear Lucas,

I am slightly confused: aren't we considering states that are much smaller than 5 (from -1.2 to 0.6 and from -0.07 to 0.07) compared to the authors who are exploring different environments?

Thank you in advance for clarification!

Best wishes,

Maria