Fine tuning ssd mobilenet

I am currently working on vehicle detection using ssd mobile net TensorFlow API. I have made a custom dataset from coco dataset which comprises of all the vehicle categories in coco i.e. car, bicycle, motorcycle, bus, truck, and also I have a dataset of 730 rickshaw images.

Ultimately my goal is to detect rickshaws along with other vehicles as well. But so far I have failed.

There are a total of 16000 instances in the train_labels.csv on average each class has 2300 instances. I have set the batch size = 12. Then I train the coco pre-trained model on my custom dataset for 12000 steps.

But unfortunately I have not been able to get good results. After training it failed to classify other vehicles.

Any advice regarding the ratio of each class in the dataset, or maybe I need more rickshaw images, how many layers should I freeze? Or may be a different perspective would be highly appreciated.

Solution

Since you have a custom dataset of 730 rickshaw images, I think there is no need to extract different dataset of other vehicles from COCO dataset for fine tuning. What I meant is the tensorflow pretrained model is really good at detecting all other vehicles than the rickshaw. Your task is just to teach the model, how to detect rickshaw.
Another option is since you already have a vehicle dataset, you can try training a model using checkpoints from COCO. https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9

Go through the above article, it will give you a fair idea about start to end flow. Author has tuned ssd mobilenet model trained on coco dataset to detect raccoon images. The raccoon was the only new class author wanted to detect. In your case, you just have to replace raccoon by rickshaw images and follow exact same steps. Author of this has used Google cloud but you can change the config file to tune it on a local machine. Considering you have only 730 new images, tuning it shouldn't take time.

This is another good example in case things are not clear https://towardsdatascience.com/building-a-toy-detector-with-tensorflow-object-detection-api-63c0fdf2ac95

Coming to your question about do you need more data, more data is always better. What I would suggest is tune model using steps above and check mAP. If you think mAP is low and the performance for your intended application is not enough, collect more data and tune again.

Please let me know if you have any questions.