I am trying to train a custom environment using PPO via Stable-Baselines3 and OpenAI Gym. For some reason the rollout statistics are not being reported for this custom environment when I try to train the PPO model.
The code that I am using is below ( I have not included the code for the CustomEnv for brevity):
env = CustomEnv(mode = "discrete")
env = Monitor(env, log_dir)
model = PPO("MlpPolicy", env, verbose=1, tensorboard_log = log_dir)
timesteps = 5000
for i in range(3):
model.learn(total_timesteps = timesteps, reset_num_timesteps = False, tb_log_name = "PPO")
model.save(f"{models_dir}/car_model_{timesteps * i}")
Below is an image demonstrating the output from the above code (on the right of the image), and the left side of the image demonstrates the usual output from a dummy environment that I am using for debugging.
I have already tried adding the line of code:
env = Monitor(env, log_dir)
But that doesnt change the output.
SOLVED: There was an edge case where the environment was not ending, and the done variable remained False indefinitely.
After fixing this bug, the Rollout statistics reappeared.