I am new to Deep learning and I am working on a hobby project related to soccer sports analytics. I want to use soccer videos and convert them on to a 2D map. I have broken down the process into smaller steps. The first step is to be able to detect players and the soccer ball.
I am thinking of starting with a pre-trained object detection model. I have taken a video and generated images from it. I used that on one of the images and I have attached the output. It is clearly not picking up on a lot of things. One way to solve this is to do some transfer learning. For that, I will have to generate my own dataset. The only way I can think of is to slice this image into windows and label them manually as players and soccer ball.
This seems like a tedious task. Are there other efficient ways of generating data? What are some best practices?
This is more likely for long-run development, but as I already wrote a similar answer so posting it here.
https://github.com/developer0hye/Yolo_Label (works great, but only for windows)
https://github.com/AlexeyAB/Yolo_mark
https://github.com/heartexlabs/label-studio (this is a more complex annotation tool for many other tasks)
With these tools, it should not take more than few hours to annotate the data.
There are many augmentation tools like - https://github.com/mdbloice/Augmentor
https://github.com/wagonhelm/rotation_augment (If you want to use rotation on the images)
I would suggest to go with darknet YOLO, which is written in C++, you wouldn't need to write any major code, it will be fast and accurate.
https://pjreddie.com/darknet/yolo/
Use this repo if you're on Linux https://github.com/pjreddie/darknet
Use this one if you're on Windows https://github.com/AlexeyAB/darknet
https://github.com/zabir-nabil/yolov3-anchor-clustering
https://github.com/zabir-nabil/tf-model-server4-yolov3
There are some pre-trained models (you'll easily find them on github) for pedestrians but they won't give you a very good performance due to very different background and also the motion artifacts.