python tensorflow object-detection object-detection-api

How to train a TF model that is larger than GPU memory?

I want to train a large object detection model using TF2, preferrably the EfficientDet D7 network. With my Tesla P100 card that has 16 GB of memory I am running into an "out of memory" exception, i.e. not enough memory on the graphics card can be allocated.

So I am wondering what my options are in this case. Is it correct that if I would have multiple GPUs, then the TF model would be split so that it fills memory of both cards? So in my case, with a second Tesla card again with 16 GB I would have 32 GB in total during training? If that is the case would that also be true for a cloud provider, where I could utilize multiple GPUs?

Moreover, if I am wrong and it would not work to split a model for multiple GPUs during training, what other approach would work in order to train a large network that does not fit into my GPU memory?

PS: I know that I could reduce the batch_size to 1, but unfortunately that does still not solve my issue for the really large models ...

Solution

You can use multiple GPU's in GCP (Google Cloud Platform) atleast, not too sure about other cloud providers. And yes, once you do that, you can train with a larger batch size (exact number would depend on the GPU, it's memory and how may you GPU's you have running in your VM)

You can check this link for the list of all GPU's available in GCP

If you're using the object detection API, you can check this post regarding training using multiple GPU's.

Alternatively, if you want to go with a single GPU, one clever trick would be to use the concept of gradient accumulation where you could virtually increase your batch size without using too much extra GPU memory, which is discussed in this post