I trained a model (Lenet-5) for 10 epochs and saved the model. loaded into 2 models ‘new_model’, ‘new_model2’ below is the colab link https://colab.research.google.com/drive/1qQhyTWNzCgMYn8t0ZtIZilLgk4JptbJG?usp=sharing
trained the new models for 5 epochs, but ended up with different train and test accuracies for each epoch, in spite of loading from same model and setting reproducibility settings.
When I continue training the original model for 5 more epochs, the results are also different from the training results of 2 new models.
Is it possible that the test and train accuracies of original model (15 epochs), 2 new models (5 epochs after loading from the checkpoint) will be same?
(After loaded checkpoint I'm getting same test accuracy for all 3 models, but results are deviating on further training of each of models.)
You should reset all the seeds to a fixed value right before launching your experiments every time you launch an experiment. In short, this should be the order:
Reusing some of your code, we could define a function to set the seed, that should be called with the same value in steps 1 and 3:
def set_seed(s):
th.manual_seed(s)
th.cuda.manual_seed_all(s)
th.backends.cudnn.deterministic = True
th.backends.cudnn.benchmark = False
np.random.seed(s)
random.seed(s)
os.environ['PYTHONHASHSEED'] = str(s)