python opencv keras neural-network object-detection

Detecting small custom object using keras

I want to detect small objects (9x9 px) in my images (around 1200x900) using neural networks. Searching in the net, I've found several webpages with codes for keras using customized layers for custom objects classification. In this case, I've understood that you need to provide images where your object is alone. Although the training is goodand it classifies them properly, unfortunately I haven't found how to later load this trained network to find objects in my big images.

On the other side, I have found that I can do this using the cnn class in cv if I load the weigths from the Yolov3 netwrok. In this case I provide the big images with the proper annotations but the network is not well trained...

Given this context, could someone show me how to load weigths in cnn that are trained with a customized network and how to train that nrtwork?

Solution

After a lot of search, I've found a better approach:

Cut your images in subimages (I cut it in 2 rows and 4 columns).
Feed yolo with these subimages and their proper annotations. I used yolov3 tiny, with a size of 960x960 for 10k steps. In my case, intensity and color was important so random parameters such as hue, saturation and exposition were kept at 0. Use random angles. If your objects do not change in size, disable random at yolo layers (random=0 in cfg files. It only randomizes the fact that it changes the size for training in every step). For this, I'm using Alexey darknet fork. If you have some blur object, add blur=1 in the [net] properties in cfg file (after hue). For blur you need Alexey fork and to be compiled with opencv (appart from cuda if you can).
Calculate anchors with Alexey fork. Cluster_num is the number of pairs of anchors you use. You can know it by opening your cfg and look at any anchors= line. Anchors are the size of the boxes that darknet will use to predict the positions. Cluster_num = number of anchors pairs.
Change cfg with your new anchors. If you have fixed size objects, anchors will be very close in size. I left the ones for bigger (first yolo layer) but for the second, the tinies, I modified and I even removed 1 pair. If you remove some, then change the order in mask [yolo] (in all [yolo]). Mask refer to the index of the anchors, starting at 0 index. If you remove some, change also the num= inside the [yolo].
After, detection is quite good.It could happen that if you detect on a video, there are objects that are lost in some frames. You can try to avoid this by using the lstm cfg. https://github.com/AlexeyAB/darknet/issues/3114

Now, if you also want to track them, you can apply a deep sort algorithm with your yolo pretrained network. For example, you can convert your pretrained network to keras using https://github.com/allanzelener/YAD2K (add this commit for tiny yolov3 https://github.com/allanzelener/YAD2K/pull/154/commits/e76d1e4cd9da6e177d7a9213131bb688c254eb20) and then use https://github.com/Qidian213/deep_sort_yolov3

As an alternative, you can train it with mask-rcnn or any other faster-rcnn algorithm and then look for deep-sort.