Search code examples
pythontensorflowgpuobject-detection-api

Will multiple GPUs allow for larger models and batch sizes when using TF2 (oibject detection API)?


I am using the TF2 research object detection API with the official models from the model zoo. However, when I either try to train a large model (e.g. one of the larger Efficient net models) and/or when I want to use a large batch size (e.g. > 32) then I run out of GPU memory.

Now I am thinking of either renting some cloud GPUs or upgrading my local hardware with a second GPU. The idea is to train the TF2 models on multiple GPUs. However, before I spend the money, I would like to know whether this would even solve my problem.

So, when one trains a TF2 model (via the object detection API) on multiple GPUs, would that also "combine" their memory, so that I can train larger models and/or use larger batch sizes?


Solution

  • You can refer to this post for training using multiple GPU's. Ideally, if you use multiple GPUs, you can set a higher batch size with none to minimum changes in your code.

    However, for splitting a large model across multiple GPU's, you will have to make some code changes that you can refer here

    You can check out all the list of strategies for distributed training here