Search code examples
tensorflowobject-detectionobject-detection-api

Best practice for TF Object Detection API for very large images


The Tensorflow Object Detection API offers a variety of models. These are trained at 600x600 image size. Suppose I have a 6000x4000 satellite image, and I want to detect objects continuously throughout the image. What is the best practice for adapting a TFODI model to this image size? I don't care about the running time per image for object detection. I have a GPU with 9GB of RAM. I know I can fit a single 6000x4000 image onto this GPU. I'm not sure if I can fit an image processing neural net for that size onto the GPU. I can think of a few alternatives:

  • Chip the image into 600x600 blocks, which risks losing features that cross the blocks, but then everything should work out of the box.

  • Change the image dimensions in the model definition from 600x600 to 6000x4000. Can I retrain from the Model Zoo checkpoint, or do I have to start from scratch if I do this?

  • Compress the image to smaller size. This distorts the image dimensions and also loses feature detail. For say a picture of a city, the resulting detail would not be adequate to pick out cars and small houses.


Solution

  • You need to try with different sizes and see using what size during training you don't run out of memory. The memory consumption also depends how many images you have that you are training on. From what you described you will end up using a intermediate size of the image