Search code examples
pythontensorflowneural-networkresuming-training

Does Re-Compiling reset the model's weights?


Looking at Daniel Möller's answer for this question I understand that recompiling a trained model should not affect/change the weights already trained. However, whenever I recompile my model to further train it using, say a different learning rate or Batch Size, the val_mse starts at a higher/worse value than it was at by the end of the initial training.

Though eventually decreasing back to the val_mse reached before, I am not sure if by recompiling the model I am simply resetting the model and retraining.

Could someone confirm whether recompiling actually does restart the learning process from scratch or not? Also whether or not it is a common practice (or if it's any good) to follow the initial training of the model with secondary phases of training with different hyper-parameters?


Solution

  • At the end of the first training epoch the weights will of course have changed. A likely reasons why you see a decrease in performance in the earlier epochs before possibly improving later is that some optimization methods have internal states that adapt over time, for example decreasing step size as you converge, or increasing momentum decay etc. After training the internal state will typically not allow stepping too far away from where the model sits since it is believed to be close to optimal, so only tries to microtune. When you restart a training from scratch the method will typically allow much bigger steps earlier on to speed up early convergence since assumption is that the model is far from optimal. In you case you start close to optimal and allow the algo to make a large step which will likely take it to a much worse point...

    If you don't want this to happen you'll need to dig into the internals of your optimization methods. Whether it is a good idea to do so? As usual in ML no one fits all answer and it depends on many factors, so try and see for your own specific case.