Search code examples
pythonpytorch

PyTorch — proper way to compute loss on GPU?


What is the proper way to handle loss values with PyTorch CUDA? For example:

  1. Should I store the loss value in GPU?
  2. How do I move the loss value to GPU?
  3. How do I update the loss value on GPU?

Inside __init__():

self.device = torch.device('cuda')
self.model = self.model.to(device)
self.total_loss = torch.Tensor([0]).to(device)

For each batch:

self.loss1 = torch.Tensor(y_true - y_pred)
self.loss2 = 0.5 # some other loss
self.total_loss = self.loss1 + self.loss2
self.total_loss.backward()

Solution

  • TL;DR Probably your loss is on GPU anyways.


    You need to place all data you use for computing the loss on GPU manually. That primarily includes model inputs and ground truth outputs. Usually you load thse using a data loader and then move them to the GPU, as demonstrated in this PyTorch tutorial.

    Now, for your case, let's just see what happens when do don't move tensors to GPU and let's also see which tensors are already on GPU.

    import torch
    
    # this is just for demo
    model = torch.nn.Linear(2, 1)
    x = torch.zeros((1, 2))
    y_true = torch.ones((1, 1))
    
    # Here the interesting stuff starts...
    # We can not query the model for its device directly, 
    # but in this case we can check the weight matrix.
    model.weight.device  # prints device(type="cpu") => model is on CPU
    x.device  # => CPU
    y_true.device  # => CPU
    
    # lets move the model to GPU
    model.to("cuda")
    model.weight.device  # device(type="cuda", index=0) => GPU
    
    # what happens if we now input x into the model?
    x.device  # still CPU
    model(x)  # throws error: Expected all tensors to be on the same device...
    
    # so we clearly need to move x to GPU too
    x = x.to("cuda")  # note that model.to modifies the model, but Tensor.to returns a new tensor. The old tensor remains on CPU.
    x.device  # GPU
    y_pred = model(x)  # no error this time
    
    # now on which device is y_pred sitting?
    y_pred.device  # GPU
    # perhaps unsurprisingly, we computed y_pred using an input that
    # is on GPU and a model on GPU and we got a tensor on GPU
    
    # now you can figure what happens when you compute a loss using 
    # a model on GPU and inputs on GPU.
    

    To come back to your example

    # I continue to use the model from above that is already on GPU
    # I will also continue to use y_pred
    total_loss = torch.tensor([0]).to("cuda")  # actually this doesn't matter
    
    loss1 = y_true - y_pred  # whops, y_true is still on GPU
    y_true = y_true.to("cuda")
    loss1 = y_true - y_pred  # you subtract two tensors, you get a tensor, no need to create a new one
    loss1.device  # GPU
    
    # what happens if we waste resources and create a new tensor?
    torch.tensor(y_true - y_pred).device  # we get a warning, but the result is still on GPU
    
    loss2 = 0.5  # clearly not a tensor, but could be
    total_loss = loss1 + loss2  # Here, we set total_loss to a new value, so no need to initialise it above
    total_loss.device  # GPU
    

    So there is no need to move your loss to GPU, it is already placed on GPU if you handle things correctly.