Nice experiments on PPO tricks. I've been trying to use PPO on PyBullet Envs but I find many tricks used in this repo are actually detrimental. (I have created my own minimal version that works: https://github.com/arthur-x/SimplyPPO. An important discrepancy is whether to clamp the sampled action before computing the log_prob. I find that clamping works better for BipedalWalker but hurts PyBullet performance.)
Is this because they are mainly tuned for Mujoco? It would be nice if the author gives a study on this.
Nice experiments on PPO tricks. I've been trying to use PPO on PyBullet Envs but I find many tricks used in this repo are actually detrimental. (I have created my own minimal version that works: https://github.com/arthur-x/SimplyPPO. An important discrepancy is whether to clamp the sampled action before computing the log_prob. I find that clamping works better for BipedalWalker but hurts PyBullet performance.)
Is this because they are mainly tuned for Mujoco? It would be nice if the author gives a study on this.