Search code examples
tensorflowkerasmulti-gpu

Multi-GPU training does not reduce training time


I have tried training three UNet models using keras for image segmentation to assess the effect of multi-GPU training.

  1. First model was trained using 1 batch size on 1 GPU (P100). Each training step took ~254ms. (Note it is step, not epoch).
  2. Second model was trained using 2 batch size using 1 GPU (P100). Each training step took ~399ms.
  3. Third model was trained using 2 batch size using 2 GPUs (P100). Each training step took ~370ms. Logically it should have taken the same time as the first case, since both GPUs process 1 batch in parallel but it took more time.

Anyone who can tell whether multi-GPU training results in reduced training time or not? For reference, I tried all the models using keras.


Solution

  • I presume that this is due to the fact that you use a very small batch_size; in this case, the cost of distributing the gradients/computations over two GPUs and fetching them back (as well as CPU to GPU(2) data distribution) outweigh the parallel time advantage that you might gain versus the sequential training(on 1 GPU).

    Expect to see a bigger difference for a batch size of 8/16 for instance.