I have 8 GPUs, 64 CPU cores (multiprocessing.cpu_count()=64)
I am trying to get inference of multiple video files using a deep learning model. I want some files to get processed on each of the 8 GPUs. For each GPU, I want a different 6 CPU cores utilized.
Below python filename: inference_{gpu_id}.py
Input1: GPU_id
Input2: Files to process for GPU_id
from torch.multiprocessing import Pool, Process, set_start_method
try:
set_start_method('spawn', force=True)
except RuntimeError:
pass
model = load_model(device='cuda:' + gpu_id)
def pooling_func(file):
preds = []
cap = cv2.VideoCapture(file)
while(cap.isOpened()):
ret, frame = cap.read()
count += 1
if ret == True:
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
pred = model(frame)[0]
preds.append(pred)
else:
break
cap.release()
np.save(file[:-4]+'.npy', preds)
def process_files():
# all files to process on gpu_id
files = np.load(gpu_id + '_files.npy')
# I am hoping to use 6 cores for this gpu_id,
# and a different 6 cores for a different GPU id
pool = Pool(6)
r = list(tqdm(pool.imap(pooling_func, files), total = len(files)))
pool.close()
pool.join()
if __name__ == '__main__':
import multiprocessing
multiprocessing.freeze_support()
process_files()
I am hoping to run inference_{gpu_id}.py files on all GPUs simultaneously
Currently, I am able to successfully run it on one GPU, 6 cores, But when I try to run it on all GPUs together, only GPU 0 runs, all others stop giving below error message.
RuntimeError: CUDA error: invalid device ordinal.
The script I am running:
CUDA_VISIBLE_DEVICES=0 inference_0.py
CUDA_VISIBLE_DEVICES=1 inference_1.py
...
CUDA_VISIBLE_DEVICES=7 inference_7.py
Consider this, if you are not using the CUDA_VISIBLE_DEVICES
flag, then all GPUs will be available to your PyTorch process. This means torch.cuda.device_count
will return 8 (assuming your version setup is valid). And you will be able to get access to each one of those 8 GPUs with torch.device
, via torch.device('cuda:0')
, torch.device('cuda:1')
, ..., and torch.device('cuda:8')
.
Now if you are only planning on using one and want to restrict your process to one. then CUDA_VISIBLE_DEVICES=i
(where i
is the device ordinal) will make it so. In this case torch.cuda
will only have access to a single device through torch.device('cuda:0')
. It doesn't matter what the actual device ordinal is, the way you access it is through torch.device('cuda:0')
.
If you allow access to more than one device: let's say n°0, n°4, and n°2, then you would use CUDA_VISIBLE_DEVICES=0,4,2
. Consequently you refer to your cuda devices via d0 = torch.device('cuda:0')
, d1 = torch.device('cuda:1')
, and d2 = torch.device('cuda:2')
. In the same order as you defined them with the flag, i.e.:
d0
-> GPU n°0,d1
-> GPU n°4, andd2
-> GPU n°2.
This makes it so you can use the same code and run it on different GPUs without having to change the underlying code where you are referring to the device ordinal.
In summary, what you need to look at is the number of devices you need to run your code. In your case: 1
is enough. You will refer to it with torch.device('cuda:0')
. When running your code, however, you will need to specify what that cuda:0
device is, with the flag:
> CUDA_VISIBLE_DEVICES=0 inference.py
> CUDA_VISIBLE_DEVICES=1 inference.py
...
> CUDA_VISIBLE_DEVICES=7 inference.py
Do note 'cuda'
will default to 'cuda:0'
.