Search code examples
pythonopencvcomputer-vision

CV2 imread PNG with colortype 4 (gray+alpha) reads as (x,y,4) array


I have a png file with two 8 bit channels, however cv2.imread(fname, CV2.IMREAD_UNCHANGED).shape returns (768, 1024,4). I have verified that the colortype field in the IHDR chunk of the PNG file is 4.

The PNG file was created with the latest version of gimp and I'm using 4.8.1 and also tried with 4.7.0. I've tested on several images, and to create such an image I loaded a jpg image into GIMP, changed the imagemode to grayscale, then exported the image as PNG with pixel mode "8bpc GRAYA".

Using notepad++ w/hex plug in, the start of the PNG file is:

PNG header:
89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 

then the IHDR chunk:
49 48 44 52    00 00 04 00    00 00 03 00    08 04 00 00    00
meaning IHDR fields:
Width:              4 bytes
Height:             4 bytes
Bit depth:          1 byte    (8 bits)
Color type:         1 byte    (4 = 4 is gray+alpha)
Compression method: 1 byte    (0)
Filter method:      1 byte    (0)
Interlace method:   1 byte    (0)

Why does cv2.imread().shape return a 4 channel array for a 2 channel image?


Solution

  • As a work-around, you can open it with PIL and convert to a Numpy array and work with OpenCV:

    # pip install pillow
    from PIL import Image
    import numpy as np
    
    # Open image with PIL
    im = Image.open('YOURIMAGE.PNG')
    
    # Check size and type
    print(im)
    

    Output - shows image is "LA" mode, which is "Lightness+Alpha", i.e. greyscale + alpha

    <PIL.PngImagePlugin.PngImageFile image mode=LA size=100x100

    # Convert to Numpy array for use by OpenCV
    na  =  p.array(im)
    
    print(na.shape)         # prints (100,100,2)
    

    I'm being over-verbose above so you can see what is happening along the way. You only really need:

    na = np.array(Image.open('image.png'))
    

    Note that you can make a (PNGtype=4) gradient image with a transparent hole in it with ImageMagick like this:

    magick -size 640x480 gradient: -fill red -draw "rectangle 100,100 200,200" -transparent red -colorspace gray result.png
    

    enter image description here

    You can check its colour type with either of the following:

    exiftool -ColorType result.png
    Color Type                      : Grayscale with Alpha
    

    or

    identify -verbose result.png | grep png:
    
    png:bKGD: chunk was found (see Background color, above)
    png:IHDR.bit-depth-orig: 16
    png:IHDR.bit_depth: 16
    png:IHDR.color-type-orig: 4
    png:IHDR.color_type: 4 (GrayAlpha)
    png:IHDR.interlace_method: 0 (Not interlaced)
    png:IHDR.width,height: 640, 480
    png:text: 3 tEXt/zTXt/iTXt chunks were found
    png:tIME: 2023-12-02T12:26:46Z
    

    You can load it with OpenCV like this:

    import cv2 as cv
    im = cv.imread('result.png', cv.IMREAD_UNCHANGED)
    print(im.shape)        # prints (480, 640, 4)
    

    It seems the grey channel is repeated 3 times as BGR channels, so you could discard the the green and red channels with:

    twoChannels = im[:,:, [0,3]].copy()