I have tried many methods to set cuda device as 1, or 0,1 however,most of them didn't work, I followed some solution but not work...
# '''cmd:CUDA_VISIBLE_DEVICES=1, python mytest.py''' #stil shows 0
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1' #cannot work
#export 'CUDA_VISIBLE_DEVICES'='1' #how to use this in detail?
if __name__ == "__main__":
import torch
print(torch.cuda.current_device()) #show 0, os.environ method cannot works
torch.cuda.set_device(0)
print(torch.cuda.current_device()) #show 0, it can works
torch.cuda.set_device(1)
print(torch.cuda.current_device()) #show 1, it can works
torch.cuda.set_device('cuda:0')
print(torch.cuda.current_device()) #show 0, it can works
torch.cuda.set_device('cuda:1')
print(torch.cuda.current_device()) #show 1, it can works
torch.cuda.set_device('cuda:0,1') #Error: why it shows Invalid device string: 'cuda:0,1'
print(torch.cuda.current_device())
device = torch.device("cuda:1")
print(torch.cuda.current_device()) #show 0, it cannot works
The only method that can work is torch.cuda.set_device()
, however when I want to use 2 gpus, like set_device('cuda:0,1') it still shows error, because it cannot accept multiple devices? If so how can I set multi devices?
P.S my environment is a Linux server, by nvidia-smi,it shows,
CUDA Version: 10.2
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
|==============+===============+============|
| 0 Tesla T4 Off | 00000000:3B:00.0 Off | 0 |
| N/A 68C P0 62W / 70W | 0MiB / 15109MiB |26% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 Off | 00000000:AF:00.0 Off | 0 |
| N/A 40C P0 16W / 70W | 0MiB / 15109MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
Remove all scripts about device in code and try:
CUDA_VISIBLE_DEVICES=0,1 python test.py
It would auto remap device IDs.
If not work, maybe you implement the distributed data parallel in the wrong way?