I started to learn ML using Tensorflow/Deeplab. I tried to train my own model from scratch for clothes recognition using semantic segmentation with mobilenet_v2 model variant. But I don't get results.
I'm using tensorflow/models for tfrecord export and training. And deeplab/example code for visualization and testing purpose (renamed locally as main.py), I modify some lines so I can get the local models and testing image.
I'll show the process I followed:
Export the "pascal dataset" directory to tfrecord like this (I'm on "models-master/research/deeplab/datasets" folder):
py build_voc2012_data.py --image_folder="pasc_imgs/JPEGImages" --semantic_segmentation_folder="pasc_imgs/SegmentationClassPNG" --list_folder="pasc_imgs/ImageSets" --image_format="jpg" --output_dir="train/tfrecord"
I edited "models-master/research/deeplab/data_generator.py" like this: {'train': 85, 'trainval': 100, 'val': 15}, num_classes=2.
py train.py --logtostderr --training_number_of_steps=10000 --train_split="train" --model_variant="mobilenet_v2" --output_stride=16 --decoder_output_stride=4 --train_batch_size=1 --dataset="pascal_voc_seg" --train_logdir="datasets/train/deeplab_model_mn" --dataset_dir="datasets/train/tfrecord"
py export_model.py --checkpoint_path=datasets/train/deeplab_model_mn/model.ckpt-10000 --export_path=datasets/train/deeplab_inference_mn/frozen_inference_graph.pb --model_variant="mobilenet_v2" --output_stride=16 --num_classes=2
data_generator.py (edited lines):
_PASCAL_VOC_SEG_INFORMATION = DatasetDescriptor(
splits_to_sizes={
'train': 85,
'trainval': 100,
'val': 15,
},
num_classes=2,# 0:background, 1:shirt
ignore_label=255,
)
Image example (1/100) that I'm using for training (I used the labelMe utility):
shirt_001.jpg
shirt_001.png
main.py result for mobilenetv2_coco_voc_trainaug (shirt as a person, that's ok) and my custom model :
mobilenetv2_coco_voc_trainaug result
my custom model result
As you can see, my model fails. I've been testing many combinations without success. What should I do? Thank you!
Ok, I had the same problem and after many attempts I've done it. First, you should make correct masks. If you use one class you should create the masks with indexed color map, and all pixels should be 0 or 1, 0 - background, 1 - mask (there're 255 colors in the indexed color map). Second, you need a bigger dataset. I tried training using a dataset with ~200 images and got no results (even with a correct dataset) even on checkpoint-30k. But when I tried training using a dataset with 450 images I had some results only from ~9000 epoch. There was no improvement after the ~18000 epoch, but the results were plausible (though far from ideal). Then I was training a model with 1100 images, but the results were the same.