machine-learning computer-vision artificial-intelligence object-detection

Object detection vs image difference for finding left object on car seat

I am confused as to which approach to take, here is the problem. I am supposed to implement a lost/found module which identifies if passengers using taxi have left something in back seat. So camera will be placed just above the back seat and once the passenger leaves the taxi, AI module will be triggered which will check if passenger forgot something on seat and alert driver. Now the problem is - To determine if there is something left, i can use object detection but not all objects would have been trained like in many cases they would just leave covers with something inside. so difficult to quantify that. Then i thot since i could get a image prior to them sitting and then compare it by taking another image after they exit and perform Image difference to find contours. But then there is a possibility of different lighting conditions which could falsely get identified a substantial difference. [Editted]Also how about image classification rather than detection as im really not interested in the location of object but just to know if object is there

So can anyone please recommend me a good approach to go ahead. Thank you

Solution

I think you are on the right track. You can go either way, object detection or image classification.

For starters I think you should try image classification. If the main goal is just to know if there is an object there, it's a binary classification problem and you will require much less of a labelling process. You will just need to address the issue of having too many background images (no object) agains too few images with objects in it, but this is feasible. Start with a pre-trained network like a ResNet-50 and see how it goes, you should have pretty good results with this approach.

For object detection, the issue I'm seeing is the amount of possible objects that could be left on the seat and possible variations, yes there are a few common ones, and you can train a network on them, but if there is a completely different object that you didn't train on it, the network won't even detect it. The variations of the object also matter in here, object detection from my experience you need a lot data with various positions and styles of the objects. It's possible, but difficult. To test this, pre-trained models on COCO should give you an overall perspective.