tensorflow deep-learning object-detection

Generating data for image processing

I am new to Deep learning and I am working on a hobby project related to soccer sports analytics. I want to use soccer videos and convert them on to a 2D map. I have broken down the process into smaller steps. The first step is to be able to detect players and the soccer ball.

I am thinking of starting with a pre-trained object detection model. I have taken a video and generated images from it. I used that on one of the images and I have attached the output. It is clearly not picking up on a lot of things. One way to solve this is to do some transfer learning. For that, I will have to generate my own dataset. The only way I can think of is to slice this image into windows and label them manually as players and soccer ball.

This seems like a tedious task. Are there other efficient ways of generating data? What are some best practices?

Solution

This is more likely for long-run development, but as I already wrote a similar answer so posting it here.

First create a dataset of the players with bounding boxes (around 500-1k, then use augmentation to make a few more thousands). You can use the following tools for annotating:

https://github.com/developer0hye/Yolo_Label (works great, but only for windows)

https://github.com/AlexeyAB/Yolo_mark

https://github.com/heartexlabs/label-studio (this is a more complex annotation tool for many other tasks)

With these tools, it should not take more than few hours to annotate the data.

There are many augmentation tools like - https://github.com/mdbloice/Augmentor

https://github.com/wagonhelm/rotation_augment (If you want to use rotation on the images)

As the players will be moving you need something with a good FPS and also a reasonable mAP. From my experiments with many object detection models, I have found yolov3 (darknet) to be the most stable.

I would suggest to go with darknet YOLO, which is written in C++, you wouldn't need to write any major code, it will be fast and accurate.

https://pjreddie.com/darknet/yolo/

Use this repo if you're on Linux https://github.com/pjreddie/darknet

Use this one if you're on Windows https://github.com/AlexeyAB/darknet

Before training, you need to find the optimal anchor size for your dataset. I wrote a simple k-means to find the anchor size in any yolo-compatible dataset.

https://github.com/zabir-nabil/yolov3-anchor-clustering

I did some minor customization (like sending OpenCV/numpy arrays directly to model) to run the darknet python API faster on a server (tensorflow model server with both REST and gRPC). I also wrote a flask server for it. You can find it here -

https://github.com/zabir-nabil/tf-model-server4-yolov3

There are some pre-trained models (you'll easily find them on github) for pedestrians but they won't give you a very good performance due to very different background and also the motion artifacts.