Setting CUDA_VISIBLE_DEVICES just has no effect even though I put it before pytorch

I have tried many methods to set cuda device as 1, or 0,1 however,most of them didn't work, I followed some solution but not work...

# '''cmd:CUDA_VISIBLE_DEVICES=1, python mytest.py'''   #stil shows 0

import os 
os.environ['CUDA_VISIBLE_DEVICES'] = '1' #cannot work

#export 'CUDA_VISIBLE_DEVICES'='1' #how to use this in detail?

if __name__ == "__main__":
    
    import torch
    print(torch.cuda.current_device()) #show 0, os.environ method cannot works

    torch.cuda.set_device(0)
    print(torch.cuda.current_device()) #show 0, it can works

    torch.cuda.set_device(1)
    print(torch.cuda.current_device()) #show 1, it can works

    torch.cuda.set_device('cuda:0')
    print(torch.cuda.current_device()) #show 0, it can works

    torch.cuda.set_device('cuda:1')
    print(torch.cuda.current_device()) #show 1, it can works
    
    torch.cuda.set_device('cuda:0,1') #Error: why it shows Invalid device string: 'cuda:0,1'
    print(torch.cuda.current_device())

    device = torch.device("cuda:1") 
    print(torch.cuda.current_device()) #show 0, it cannot works

The only method that can work is torch.cuda.set_device(), however when I want to use 2 gpus, like set_device('cuda:0,1') it still shows error, because it cannot accept multiple devices? If so how can I set multi devices?

P.S my environment is a Linux server, by nvidia-smi,it shows,

 CUDA Version: 10.2  

| GPU  Name   Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
|==============+===============+============|
|   0  Tesla T4            Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   68C    P0    62W /  70W |  0MiB / 15109MiB |26% Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:AF:00.0 Off |                    0 |
| N/A   40C    P0    16W /  70W |   0MiB / 15109MiB | 6% Default |
+-------------------------------+----------------------+----------------------+

Solution

Remove all scripts about device in code and try:

CUDA_VISIBLE_DEVICES=0,1 python test.py

It would auto remap device IDs.

If not work, maybe you implement the distributed data parallel in the wrong way?