Search code examples
computer-visionimage-segmentationsemantic-segmentation

How to save and reuse semantic segmentation results?


I use detectron2 to run semantic segmentation on images. Detectron2 has a prebuilt function for visualizing the results. I am interested in saving the results of the segmentation and parsing them when needed. Hence I backtracked the code and found instances_to_coco_json function to be the one outputting the results of segmentation. I tried saving the results.

The result is in the following format:

 {
        "image_id": 1, 
        "segmentation": {
            "counts": "R[W<=Sf0001O000000000000000000000000000000000000000^_\\?", 
            "size": [
                720, 
                1280
            ]
        }, 
        "category_id": 1, 
        "score": 0.992115
    }, 

I was expecting to get the segmentation results as coordinates of the segmentation points like the following:

 "segmentation": [
            [
                662.1764705882352, 
                387, 
                686, 
                386.5882352941176, 
                686, 
                398, 
                662.7647058823529, 
                399
            ]

Given the output is in the coco format, how do I make sense of it?


Solution

  • To understand the problem, you will need to know that there are two different formats of storing masks in the COCO protocol. One is using polygons, such as your second example, another is to use a binary data compression format called RLE, which is the case of your first example.

    In COCO, if a mask is stored in RLE format, then the segmentation will be an object with keys of "counts" and "size". If you check detectron2/detectron2/utils/visualizer.py, you will find the code to handle different mask formats in the constructor of class GenericMask.

    You can easily convert RLE format to binary masks or polygons, and you will also find a visualization script located at detectron2/tools/visualize_json_results.py to be very helpful.

    TLDR:

    In short, to convert an RLE segmentation to binary mask, simply run the following script (assume you have installed the COCO python api, which is a prerequisite of detection2):

    import pycocotools.mask as mask_util
    
    # Using the segment provided by your first example
    segment = {'counts': 'R[W<=Sf0001O000000000000000000000000000000000000000^_\\?',
     'size': [720, 1280]}
    # Decode a binary mask of shape (720, 1280) from segment
    mask = mask_util.decode(segment)[:, :]
    

    If you are interested in converting binary masks to polygons, there is another package called imantics can help you achieve this:

    import numpy as np
    from imantics import Polygons, Mask
    
    # This can be any array
    array = np.ones((100, 100))
    
    polygons = Mask(array).polygons()
    
    print(polygons.points)
    print(polygons.segmentation)
    

    Hope that helps.