Search code examples
pytorchddp

Distributed Data Parallel (DDP) Batch size


Suppose, I use 2 gpus in a DDP setting.

So, if I intend to use 16 as a batch size if I run the experiment on a single gpu,

should I give 8 as a batch size, or 16 as a batch size in case of using 2 gpus with DDP setting??

Does 16 is divided into 8 and 8 automatically?

Thank you -!


Solution

  • As explained here:

    • the application of the given module by splitting the input across the specified devices
    • The batch size should be larger than the number of GPUs used locally
    • each replica handles a portion of the input

    If you use 16 as batch-size, it will be divided automatically between the two gpus.