Search code examples
rgbfile-extensiongrayscalelabelingsemantic-segmentation

How should be a labelled image for semantic segmentation?


As I understand from the below explanation, there will be two types of images for semantic segmentation which are inputs and masks. Mask images are the images that contain a 'label' in pixel value which could be some integer (0 for ROAD, 1 for TREE or (100,100,100) for ROAD (0,255,0) for TREE).

Semantic segmentation describes the process of associating each pixel of an image with a class label, (such as flower, person, road, sky, ocean, or car). https://se.mathworks.com/help/vision/ug/semantic-segmentation-basics.html

According to my research, there are lots of types of labelled images for semantic segmentation. Along with the different extensions(.png .jpg .gif .bmp...), some of them are RGB labelled (3-channel) images and some are GRAY (1-channel) images. Below, there are two examples to explain this situation better.

  1. RGB labelled with the extension '.png'

    https://github.com/divamgupta/image-segmentation-keras#user-content-preparing-the-data-for-training

  2. GRAY scale labelled with the extension '.gif'

    https://www.kaggle.com/kmader/vgg16-u-net-on-carvana/#data

If my image has labelled as GRAY scale, I basically make it RGB by copying each value of this GRAY channel for 3 RGB channel. Just the opposite, by averaging the RGB channels, I can make the labelled image as GRAY scale. What is the difference? Which one is more suitable for which task (binary segmentation or sth else)?

In my case, I have 4 classes and try to do multiclass semantic segmentation. I've already labelled about 600 images on DataTurks. That means, I just have the object's polygons, and I have to make my labelled image on my own. For now, the extension of my input images and the mask images are '.jpg' and '.png' respectively. How should I label my images along with the which extension?


Solution

  • You can save the mask as grayscale png images with the values being one of 0,1,2,3(since you have 4 classes) at each location corresponding to the class(tree, bush etc.) of the pixel values in the input images.

    You can verify that the mask image is generated correctly by doing this.

    import cv2
    import numpy as np
    lbl_img = '<path_to_mask_image>'
    mask = cv2.imread(lbl_img, 0)
    print(np.unique(mask))
    

    [0 1 2 3] # this will vary based on number of classes present in mask image