Search code examples
pythonmachine-learningcaffecoreml

Shaping output array from Caffe segmentation for PIL Image


hope everyones day (or night) is going well.

I've been playing around with a Caffe model I came across, and I've been having some trouble working with the output array. I haven't worked with segmentation before so this may be a simple fix for someone more knowledgeable on the subject.

The model is based on this paper Deep Joint Task Learning for Generic Object Extraction. I have converted the model in CoreML format.

The issue I have is this:

When trying to create a PIL image from the output, I get what seems like random noise and I think its just a simple issue of the numpy array being mis-shaped or the order of the pixels is wrong. The output array is of shape (2500, 1) and it's supposed to be a 50x50 black and white image

Code looks like this:

image = Image.open('./1.jpg')
image = image.resize((55, 55), Image.ANTIALIAS)

predictions = model.predict({'data_55': image} , useCPUOnly = False)
predictions = predictions['fc8_seg']

reshape_array = numpy.reshape(predictions, (50,50))
output_image = Image.fromarray(reshape_array, '1')

I've tried both F and C orders on the numpy reshape and can't seem to get anything other than noise that looks like this . I'm using one of the test images provided in the original repo so it shouldn't be a problem. As a side note, the values in the array look like this:

[[  4.55798066e-08   5.40980977e-07   2.13476710e-06 ...,   6.66990445e-08
6.81615759e-08   3.21255470e-07]
[  2.69358861e-05   1.94866928e-07   4.71876803e-07 ...,   1.25911642e-10
3.14572794e-08   1.61371077e-08]

Any thoughts or answers would be much appreciated and helpful. Thanks ahead of time!


Solution

  • Looks like I was able to figure this out. It wasn't an issue with the order of the array, but with the values and data type. Here is the code I put together to get a proper image from the output.

    predictions = model.predict({'data_55': image} , useCPUOnly = True) # Run the prediction
    
    map_final = predictions['fc8_seg'][0,0,:,:] # fc8_seg is the output of the neural network
    map_final = map_final.reshape((50,50)) # Reshape the output from shape (2500) to (50, 50)
    map_final = numpy.flip(map_final, 1) # Flip axis 1 to unmirror the image
    
    # Scale the values in the array to a range between 0 and 255
    map_final -= map_final.min() 
    map_final /= map_final.max()
    map_final = numpy.ceil(map_final*255)
    
    map_final_unint8 = map_final.astype(numpy.uint8) # Convert the data type to an uint8
    pil_image = Image.fromarray(map_final_unint8, mode = 'L') # Create the PIL image
    

    And the output:

    And the output

    Everything looks just as it should!