Search code examples
pythonreinforcement-learningstable-baselines

Is it possible to set the exploration rate to 0, and turn off network training for a Stable Baselines 3 algorithm?


After training a stable baselines 3 RL algorithm (I am using mainly PPO) I want to set the exploration rate to 0, and turn off network training so I always get the same output (action) from the model when given the same input (observation). Is it possible to do that? If not, is there a reason for why it should not be possible?


Solution

  • Setting deterministic to True when calling model.predict() seems to do the trick (it defaults to False):

    model.predict(observation, deterministic=True)