What are the fastest ways to do object detection (context in the question)?

So recently I tried to make a bot to fish in Minecraft as a challenge. (not that I use it in any community, or modify the game`s code so I guess its ok with TOS) My approach was and stays so far to track the movements of the bob.

My first bot relied on color space segmentation and finetuning the image with morphological transformations from OpenCV-python (as part of my learning experience I aimed to make the bot purely computer vision based). That bot only worked in specific location where I set illumination and environment color with in-game methods. Also it worked at expense of turning games graphics to lowest settings to disable particles.

My second bot used HAAR-like classifiers, since I already made few models for real life objects which were fairly good. Sadly this time (I assume due to the game`s unique graphic style where essentially everything is a cube with textures mapped on it) it was fairly inconsistent and caused a lot of false positives.

My third bot used HOG-features based svm but it was fairly slow for all models ranging from more then 4000 original samples with really fit bounding boxes to about 200, due to that lack of speed fish was of the hook when detection occurred.

My last attempt used tensor flow lite and failed miserably due to even worse detection speed.

I also looked into possibility of doing motion detection by comparing the consequent frames, and speed benefits of java vs python, as well as different preprocessing options like increasing contrast, reducing color pallet and etc.

AT this point I don't know if wondering 'blind' will give me any clues on what would be the 'to go' approach, and hence I decided to ask here.

Thanks in advance.

P.S. For exact specifics - I think the time to reel is approximately 0.7 seconds but I can be slightly off.

Solution

For a fast and straight forward object detection technique, I would suggest you to use a pretrained retinanet. You can find all the explanation that you would need to know, from these links: https://github.com/fizyr/keras-retinanet

And follow this Collab, for fast training and straight forward implementation: https://colab.research.google.com/drive/1v3nzYh32q2rm7aqOaUDvqZVUmShicAsT

I would suggest that you resnet50 as backbone, and use the pretrained weights to start your training.