I use the Pytorch. In the computation, I move some data and operators A in the GPU. In the middle step, I move the data and operators B to CPU and continue the forward.
My question is that:
My operator B is very memory-consuming that cannot be used in GPU. Will this affect (some parts compute in GPU and the others are computed in CPU) the backpropagation?
Pytorch keeps track of the location of tensors. If you use .cpu()
or .to('cpu')
pytorch's native commands you should be okay.
See, e.g., this model parallel tutorial - the computation is split between two different GPU devices.