Search code examples
reinforcement-learningsnapshotstable-baselines

Resume training for stable baseline model PPO


I wonder if I can resume training from checkpoints after saving logs in stable baseline as from what I understood from the documentation that CheckpointCallback can work as snapshots (yet not sure about this)

This code creates logs but I am not sure if I can use them to resume training and if this even possible using this method

controller = PPO('MlpPolicy', environment, verbose=0, clip_range=0.15, device='auto', learning_rate=0.00001, )
    checkpoint_callback = CheckpointCallback(save_freq=1000, save_path= output_dir +'./logs/', name_prefix='rl_model')
    controller.learn(total_timesteps=int(timesteps), callback=checkpoint_callback)

I tried using tensorboard_log as it is what I found clear explanations about how to do this (save model and resume training) but I get the error zsh: illegal hardware instruction, and I never got output from it or link to where I can monitor my model, I couldn't find much solutions for this either

I hope that someone can help, and if what I am doing is wrong then can you please let me know the best way to do this

Thank you


Solution

  • Every callback, which overrides on_step method (such as CheckPointCallback), returns a boolean value that tells your train and consequently learn calls whether they should further proceed. Training resumes automatically. Sometimes you may wish to break your training instead, then you can write a callback, whose on_step will return false by certain conditions, e.g., when the number of episodes exceeds a threshold.

    As for tensorboard log: it is about saving your training metrics, not a model.