Plot entropy, avg rewards etc from Stable Baselines

I have built a custom environment by implementing the step, reset and render methods from StableBaselines but I don't know how to print some plots.

For instance, how many times my agent (in a Discrete action space) took action = 0, 1, 2, etc...

What signal did the environment give.

How the rewards moved?

I found about results_plotter but couldn't find many info of it.

results_plotter.plot_results(["."], 10e6, results_plotter.X_TIMESTEPS, "Market rewards")

Solution

There is no pre-made tool for this at the moment. Have a look at the Monitor wrapper and how it tracks the episodic rewards. It will generate a log file which you can use to get some metrics out. This is your best bet imo.

I recommend also taking a look at Tensorboard too, as it might provide some real-time info.