Search code examples
machine-learningmxnet

Is there any way to preserve the internal variables of a trainer in MxNet?


I wrote a program which contains an algorithm called distributed randomized gradient descent (DRGD). There are some internal variables in the algorithm which are used to calculate the step lengths. The training algorithms should be much complex than DRGD, so there should be more internal variables. If we preserve these variables, we can pause training and test the model; then, we will resume the training again.


Solution

  • It is possible to save the states of the trainer and resume training by calling the .save_states() and .load_states() functions on the Trainer class during a training with MXNet Gluon.

    Here is an example:

    trainer = gluon.Trainer(net.collect_params(), 'adam')
    trainer.save_states('training.states')
    trainer.load_states('training.states')