Search code examples
pythonimage-processingpytorchresuming-training

How to resume a pytorch training of a deep learning model while training stopped due to power issues or some other interrpts


Actually i am training a deep learning model and want to save checkpoint of the model but its stopped when power is off then i have to start from that point from which its interrupted like 10 epoches completed and want to resume/start again from epoch 11 with that parameters


Solution

  • In PyTorch, you can resume from a specific point by using epoch key from the checkpoint dictionary as follows:

    # Load model checkpoint
    checkpoint = torch.load("checkpoint.pth")
    model.load_state_dict(checkpoint['model'])
    epoch = checkpoint['epoch']
    
    # Resume training from a specific epoch
    for epoch in range(epoch + 1, num_epochs):
        ...