Torch, how to use Multiple GPU for different dataset

Assume that I have 4 different datasets and 4 GPU like below

4 dataset

dat0 = [np.array(...)], dat1 = [np.array(...)] , dat2 = [np.array(...)] , dat3 = [np.array(...)]

4 GPU

device = [torch.device(f'cuda:{i}') for i in range(torch.cuda.device_count())]

assume all the four data set have already converted into tensor and transfer to 4 different GPU.

Now, I have a function f from other module which can be used on GPU

How can I do the following at the same time,

compute 4 resulf of this

 ans0 = f(dat0) on device[0], ans1 = f(dat1) on device[1], ans2 = f(dat2) on device[2], ans3 = f(dat3) on device[3]

then move all the 4 ans back to cpu then calculate the sum

 ans = ans0 + ans1 + ans2 + ans3

Solution

Assuming you only need ans for inference. You can easily perform those operations but you will certainly need function f to be on all four GPUs at the same time.

Here is what I would try: duplicate f four times and send to each GPU. Then compute the intermediate result, sending back each result to the CPU for the final operation:

fns = [f.clone().to(device) for device in devices]

results = []
for fn, data in zip(fns, datasets):
    result = fn(data).detach().cpu()
    results.append(result)

ans = torch.stack(results).sum(dim=0)