Search code examples
reinforcement-learningopenai-gymstable-baselinesopenai-api

Rollout summary statistics not being monitored for CustomEnv using Stable-Baselines3


I am trying to train a custom environment using PPO via Stable-Baselines3 and OpenAI Gym. For some reason the rollout statistics are not being reported for this custom environment when I try to train the PPO model.

The code that I am using is below ( I have not included the code for the CustomEnv for brevity):

env = CustomEnv(mode = "discrete")
env = Monitor(env, log_dir)
model = PPO("MlpPolicy", env, verbose=1, tensorboard_log = log_dir)

timesteps = 5000
for i in range(3):
  model.learn(total_timesteps = timesteps, reset_num_timesteps = False, tb_log_name = "PPO")
  model.save(f"{models_dir}/car_model_{timesteps * i}")

Below is an image demonstrating the output from the above code (on the right of the image), and the left side of the image demonstrates the usual output from a dummy environment that I am using for debugging.

On the left we have the usual output from model.learn() applied to a dummy environment, with rollout statistics being reported. On the right is my custom environment, where only 'time' and 'train' statistics are being reported.

I have already tried adding the line of code:

env = Monitor(env, log_dir)

But that doesnt change the output.


Solution

  • SOLVED: There was an edge case where the environment was not ending, and the done variable remained False indefinitely.

    After fixing this bug, the Rollout statistics reappeared.