Search code examples

Will switching GPU device affect the gradient in PyTorch back propagation?

I use the Pytorch. In the computation, I move some data and operators A in the GPU. In the middle step, I move the data and operators B to CPU and continue the forward.

My question is that:

My operator B is very memory-consuming that cannot be used in GPU. Will this affect (some parts compute in GPU and the others are computed in CPU) the backpropagation?


  • Pytorch keeps track of the location of tensors. If you use .cpu() or .to('cpu') pytorch's native commands you should be okay.

    See, e.g., this model parallel tutorial - the computation is split between two different GPU devices.