Search code examples

PPO Boid agent not learning

I have a custom Boid flocking environment in OpenAI Gym using PPO from StableBaselines3. I wanted it to achieve flocking similar to Reynold's model (Video) or close enough, but it ISN'T learning.

Reynold's model

My Code

My results after 100000 timesteps of training:

My boids


I have adjusted the calculate_reward my model uses to be similar in reward, to encourage Reynold's model like behavior, but can't see any apparent improvement.


  • Ran it for 2Million, I am able to see that they all now just move away.

    Two insights, Training time was too less and reward function needs to be modified.

    2Million Training, 3000 steps run