The following code from the tutorial to pytorch data paraleelism reads strange to me:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
print("Let's use", torch.cuda.device_count(), "GPUs!")
# dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
model = nn.DataParallel(model)
model.to(device)
According to my best knowledge, mode.to(device)
copy the data to GPU.
DataParallel splits your data automatically and sends job orders to multiple models on several GPUs. After each model finishes their job, DataParallel collects and merges the results before returning it to you.
If the DataParallel
does the job of copying, what does the to(device)
do here?
They add few lines in the tutorial to explain nn.DataParallel
.
DataParallel splits your data automatically, and send job orders to multiple models on different GPUs using the data. After each model finishes their job, DataParallel collects and merges the results for you.
The above quote can be understood that nn.DataParallel
is just a wrapper class to inform model.cuda()
should make a multiple copies to GPUs.
In my case, I don't have any GPU on my laptop. I still call nn.DataParallel()
without any problem.
import torch
import torchvision
model = torchvision.models.alexnet()
model = torch.nn.DataParallel(model)
# No error appears if I don't move the model to `cuda`