Search code examples
machine-learningcomputer-visiongcloud

How to train an ML model to generate predictions for multiple regions in one image


I am working on a home project helping out a friend of mine who runs a factory with 4 loading docks. He has a camera that can see all 4 loading docks. We want to create a model to take the image and say bay 1 occupied, bay 2 free, bay 3 blocked ect...

I have played with ML a little bit before and have managed to create a model that can tell me that my garage door is open. The difficulty here is that I want to train the model with 4 different statuses in each image and when the model is up and running I want it to identify each loading bay in the image and give me the status of each.

I am hoping that the AI can do this all for me but I realise I might need to do more work for it!

Another option was that I could split the images up so that each image was sliced into 4, I could then just feed the prediction model an image and ask for 1 status back. Is this a more sensible approach (despite it being a pain for me to do this automatically)?


Solution

  • As discussed in the comments, going by 2nd option: you can split the images based on the fixed bounding boxes of the four bay and train your model on that data. You can refer to this article which discusses extracting objects from a picture, and as mentioned by @guillaume if your bays are similar you will require much less data. You can then use extracted images from all bays and perform prediction.