python tensorflow semantic-segmentation deeplab mobilenet

My custom mobilenet trained model is not showing any results. What am I doing wrong?

I started to learn ML using Tensorflow/Deeplab. I tried to train my own model from scratch for clothes recognition using semantic segmentation with mobilenet_v2 model variant. But I don't get results.

I'm using tensorflow/models for tfrecord export and training. And deeplab/example code for visualization and testing purpose (renamed locally as main.py), I modify some lines so I can get the local models and testing image.

I'll show the process I followed:

Download 100 JPEG images (I know is not that big, but I guess I can try it with this amount). Just for 1 class -> shirts
Create the segmentation class PNG for each image.
Create the files image sets definition for: train(85 filenames), trainval(100 filenames) and val(15 filenames).
So my "pascal dataset" directory has: ImageSets, JPEGImages and SegmentationClassPNG folders.

Export the "pascal dataset" directory to tfrecord like this (I'm on "models-master/research/deeplab/datasets" folder):

py build_voc2012_data.py --image_folder="pasc_imgs/JPEGImages" --semantic_segmentation_folder="pasc_imgs/SegmentationClassPNG" --list_folder="pasc_imgs/ImageSets" --image_format="jpg" --output_dir="train/tfrecord"

this works fine, it generates *.tfrecord files on "train/tfrecord"

I edited "models-master/research/deeplab/data_generator.py" like this: {'train': 85, 'trainval': 100, 'val': 15}, num_classes=2.
Now time to start the training, (I'm on "models-master/research/deeplab"). I used 10000 steps, why? I proved with 30000 and takes like 30 hours with no results, so I reduce it with new params. I guess 10000 steps could show me something:
```
py train.py --logtostderr  --training_number_of_steps=10000 --train_split="train" --model_variant="mobilenet_v2" --output_stride=16 --decoder_output_stride=4 --train_batch_size=1 --dataset="pascal_voc_seg"  --train_logdir="datasets/train/deeplab_model_mn" --dataset_dir="datasets/train/tfrecord"
```
- This step takes almost 8 hours (have a tiny GPU, so.. can't use it), and it generates the checkpoint, graph.pbtxt, and model.ckpt-XXX (10000 included) files.

I exported the previous result with (I'm on "models-master/research/deeplab") this command line:

py export_model.py --checkpoint_path=datasets/train/deeplab_model_mn/model.ckpt-10000 --export_path=datasets/train/deeplab_inference_mn/frozen_inference_graph.pb --model_variant="mobilenet_v2" --output_stride=16 --num_classes=2

It creates the frozen graph (frozen_inference_graph.pb).

Now run: py main.py (proof image and frozen_inference_graph.pb already imported)
No results with my custom model. This last script works with pre-trained mobilenetv2_coco_voc_trainaug. Not with my custom model

data_generator.py (edited lines):

_PASCAL_VOC_SEG_INFORMATION = DatasetDescriptor(
    splits_to_sizes={
        'train': 85,
        'trainval': 100,
        'val': 15,
    },
    num_classes=2,# 0:background, 1:shirt
    ignore_label=255,
)

Image example (1/100) that I'm using for training (I used the labelMe utility):
shirt_001.jpg
shirt_001.png

main.py result for mobilenetv2_coco_voc_trainaug (shirt as a person, that's ok) and my custom model :
mobilenetv2_coco_voc_trainaug result
my custom model result

As you can see, my model fails. I've been testing many combinations without success. What should I do? Thank you!

Solution

Ok, I had the same problem and after many attempts I've done it. First, you should make correct masks. If you use one class you should create the masks with indexed color map, and all pixels should be 0 or 1, 0 - background, 1 - mask (there're 255 colors in the indexed color map). Second, you need a bigger dataset. I tried training using a dataset with ~200 images and got no results (even with a correct dataset) even on checkpoint-30k. But when I tried training using a dataset with 450 images I had some results only from ~9000 epoch. There was no improvement after the ~18000 epoch, but the results were plausible (though far from ideal). Then I was training a model with 1100 images, but the results were the same.