Item 44239510

Does anyone know why they added minibatch advantage normalization (or when it can be useful)?

The paper they cite "What matters in on-policy RL" claims it does not lead to much difference on their suite of test problems, and (mean-of-minibatch)-normalization doesn't seem theoretically motivated for convergence to the optimal policy?

danielhanchen • 1 day ago

Tbh I'm unsure as well I took a skim of the paper so if I find anything I'll post it here!