After training a stable baselines 3 RL algorithm (I am using mainly PPO) I want to set the exploration rate to 0, and turn off network training so I always get the same output (action) from the model when given the same input (observation). Is it possible to do that? If not, is there a reason for why it should not be possible?
Setting deterministic to True when calling model.predict() seems to do the trick (it defaults to False):
model.predict(observation, deterministic=True)