I am looking for the correct and most efficient way of saving, loading, and retraining a model in Libtorch (C++) with both the model and optimizer state dict. I believe I have everything correctly set (however this may not be right for saving and loading optimizer state dicts, only the model state dict I am absolutely sure of), my last question is where I set the Optimizer and give it the model parameters.
Saving Model and Optimizer:
// Save model state
torch::serialize::OutputArchive output_model_archive;
myModel.to(torch::kCPU);
myModel.save(output_model_archive);
output_model_archive.save_to(model_state_dict_path);
// Save optim state
torch::serialize::OutputArchive output_optim_archive;
myOptimizer->save(output_optim_archive);
output_optim_archive.save_to(optim_state_dict_path);
Loading model and optim state for retraining.
// Load model state
torch::serialize::InputArchive input_archive;
input_archive.load_from(state_dict);
myModel.load(input_archive);
// Load optim state
torch::serialize::InputArchive input_archive;
input_archive.load_from(state_dict);
myOptimizer->load(input_archive);
When creating the optimizer object, you need to give it the model parameters:
std::shared_ptr<torch::optim::Optimizer> myOptimizer;
myOptimizer.reset(new torch::optim::Adam(myModel.parameters(), torch::optim::AdamOptions(LR)));
Should this be done before the state dicts are loaded, after, or does it matter? For example, I am doing it like:
// Setup model and optimizer object, set model params in optimizer
// Load state dictionaries...
// Train epoch #n...
myOptimizer->step();
// Save state dictionaries
To answer my own question, the model state dict needs to be loaded and then parameters put into the optimizer object. Then load the state dict into the optimizer object.
My use case was a little more complicated as I was aggregating gradients from multiple nodes where training was happening and doing an optimizer step on a "master" node. I was trying to simplify the problem above for the question, and I assumed I did not need the previous state dict since I was aggregating gradients. That was an incorrect assumption. The flow looks like:
// Load model state dict
// Aggregate gradients
// Load Optimizer state dict / params into optim
// Step