PyTorch — proper way to compute loss on GPU?

What is the proper way to handle loss values with PyTorch CUDA? For example:

Should I store the loss value in GPU?
How do I move the loss value to GPU?
How do I update the loss value on GPU?

Inside init():

self.device = torch.device('cuda')
self.model = self.model.to(device)
self.total_loss = torch.Tensor([0]).to(device)

For each batch:

self.loss1 = torch.Tensor(y_true - y_pred)
self.loss2 = 0.5 # some other loss
self.total_loss = self.loss1 + self.loss2
self.total_loss.backward()

Solution

TL;DR Probably your loss is on GPU anyways.

You need to place all data you use for computing the loss on GPU manually. That primarily includes model inputs and ground truth outputs. Usually you load thse using a data loader and then move them to the GPU, as demonstrated in this PyTorch tutorial.

Now, for your case, let's just see what happens when do don't move tensors to GPU and let's also see which tensors are already on GPU.

import torch

# this is just for demo
model = torch.nn.Linear(2, 1)
x = torch.zeros((1, 2))
y_true = torch.ones((1, 1))

# Here the interesting stuff starts...
# We can not query the model for its device directly, 
# but in this case we can check the weight matrix.
model.weight.device  # prints device(type="cpu") => model is on CPU
x.device  # => CPU
y_true.device  # => CPU

# lets move the model to GPU
model.to("cuda")
model.weight.device  # device(type="cuda", index=0) => GPU

# what happens if we now input x into the model?
x.device  # still CPU
model(x)  # throws error: Expected all tensors to be on the same device...

# so we clearly need to move x to GPU too
x = x.to("cuda")  # note that model.to modifies the model, but Tensor.to returns a new tensor. The old tensor remains on CPU.
x.device  # GPU
y_pred = model(x)  # no error this time

# now on which device is y_pred sitting?
y_pred.device  # GPU
# perhaps unsurprisingly, we computed y_pred using an input that
# is on GPU and a model on GPU and we got a tensor on GPU

# now you can figure what happens when you compute a loss using 
# a model on GPU and inputs on GPU.

To come back to your example

# I continue to use the model from above that is already on GPU
# I will also continue to use y_pred
total_loss = torch.tensor([0]).to("cuda")  # actually this doesn't matter

loss1 = y_true - y_pred  # whops, y_true is still on GPU
y_true = y_true.to("cuda")
loss1 = y_true - y_pred  # you subtract two tensors, you get a tensor, no need to create a new one
loss1.device  # GPU

# what happens if we waste resources and create a new tensor?
torch.tensor(y_true - y_pred).device  # we get a warning, but the result is still on GPU

loss2 = 0.5  # clearly not a tensor, but could be
total_loss = loss1 + loss2  # Here, we set total_loss to a new value, so no need to initialise it above
total_loss.device  # GPU

So there is no need to move your loss to GPU, it is already placed on GPU if you handle things correctly.

PyTorch — proper way to compute loss on GPU?

Inside __init__():

For each batch:

Inside init():