The bash file I used to launch the training looks like this:
CUDA_VISIBLE_DEVICES=3,4 python -m torch.distributed.launch \
--nproc_per_node=2 train.py \
--batch_size 6 \
--other_args
I found that the batch size of tensors in each GPU is acctually batch_size / num_of_gpu
= 6/2
= 3.
When I initialize my network, I need to know the batch size in each GPU.
(Ps. in this phase, I can't use input_tensor.shape
to get the size of batch-dimension, since there are no data fed in jet.)
Somehow I could not find where does the pytorch store the parameter --nproc_per_node
.
So how could I know how many GPUs are used, without passing it manually as --other_args
?
I think you are looking for torch.distributed.get_world_size()
- this will tell you how many processes were created.