MP1 - RND Normalization

MP1 - RND Normalization

by Gaston Emil Wolfart -
Number of replies: 2

Hello,

I have a question about the normalizations in part 3.4.

The project description states that the states have to be normalized using only a running average and that the intrinsic reward has to be also normalized but this time clamped between -5 and 5.

However, the linked article in the project description [Burda et al., 2018]. seems to state the opposite, namely that the intrinsic reward has to be only normalized with a running average and that it is the next_state that has to be normalized and clamped between -5 and 5.


Which one should we implement ?

Thanks for your help,

Gaston Wolfart


In reply to Gaston Emil Wolfart

Re: MP1 - RND Normalization

by Lucas Louis Gruaz -
Hello,

Yes, you are right. This is a mistake we made, but both work fine for the given task. You can implement any of the two options, both will be considered correct.

Best,
Lucas
In reply to Lucas Louis Gruaz

Re: MP1 - RND Normalization

by Maria Yuffa Meshcheryakova -

Dear Lucas,

I am slightly confused: aren't we considering states that are much smaller than 5 (from -1.2 to 0.6 and from -0.07 to 0.07) compared to the authors who are exploring different environments? 

Thank you in advance for clarification!

Best wishes,

Maria