Search code examples
deep-learningpytorch

Pytorch 1.0: what does net.to(device) do in nn.DataParallel?


The following code from the tutorial to pytorch data paraleelism reads strange to me:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
  print("Let's use", torch.cuda.device_count(), "GPUs!")
  # dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
  model = nn.DataParallel(model)

model.to(device)

According to my best knowledge, mode.to(device) copy the data to GPU.

DataParallel splits your data automatically and sends job orders to multiple models on several GPUs. After each model finishes their job, DataParallel collects and merges the results before returning it to you.

If the DataParallel does the job of copying, what does the to(device) do here?


Solution

  • They add few lines in the tutorial to explain nn.DataParallel.

    DataParallel splits your data automatically, and send job orders to multiple models on different GPUs using the data. After each model finishes their job, DataParallel collects and merges the results for you.

    The above quote can be understood that nn.DataParallel is just a wrapper class to inform model.cuda() should make a multiple copies to GPUs.

    In my case, I don't have any GPU on my laptop. I still call nn.DataParallel() without any problem.

    import torch
    import torchvision
    
    model = torchvision.models.alexnet()
    model = torch.nn.DataParallel(model)
    # No error appears if I don't move the model to `cuda`