Search code examples
binaryhexpngrgbpixel

Interpret PNG pixel data


Looking at the PNG specification, it appears that the PNG pixel data chunk starts with IDAT and ends with IEND (slightly clearer explanation here). In the middle are values that don't make sense to make sense to me.

How can I get usable RGB values from this, without using any libraries (ie from the raw binary file)?

As an example, I made a 2x2px image with 4 black rgb(0,0,0) pixels in Photoshop:
Just four black pixels...

Here's the resulting data (in the raw binary input, the hex values, and the human-readable ASCII):

BINARY      HEX ASCII
01001001    49  'I'
01000100    44  'D'
01000001    41  'A'
01010100    54  'T'
01111000    78  'x'
11011010    DA  '\xda'
01100010    62  'b'
01100000    60  '`'
01000000    40  '@'
00000110    06  '\x06'
00000000    00  '\x00'
00000000    00  '\x00'
00000000    00  '\x00'
00000000    00  '\x00'
11111111    FF  '\xff'
11111111    FF  '\xff'
00000011    03  '\x03'
00000000    00  '\x00'
00000000    00  '\x00'
00001110    0E  '\x0e'
00000000    00  '\x00'
00000001    01  '\x01'
10000011    83  '\x83'
11010100    D4  '\xd4'
11101100    EC  '\xec'
10001110    8E  '\x8e'
00000000    00  '\x00'
00000000    00  '\x00'
00000000    00  '\x00'
00000000    00  '\x00'
01001001    49  'I'
01000101    45  'E'
01001110    4E  'N'
01000100    44  'D'

Solution

  • You missed a rather crucial detail in both the specifications:

    The official one:

    .. The IDAT chunk contains the actual image data which is the output stream of the compression algorithm.
    [...]
    Deflate-compressed datastreams within PNG are stored in the "zlib" format.

    Wikipedia:

    IDAT contains the image, which may be split among multiple IDAT chunks. Such splitting increases filesize slightly, but makes it possible to generate a PNG in a streaming manner. The IDAT chunk contains the actual image data, which is the output stream of the compression algorithm.

    Both state the raw image data is compressed. Looking at your data, the first 2 bytes

    78 DA
    

    contain the compression flags as specified in RFC1950. The rest of the data is compressed.

    Decompressing this with a general zlib compatible routine show 14 bytes of output:

    00 00 00 00 00 00 00
    00 00 00 00 00 00 00
    

    where each first byte is the PNG row filter (0 for both rows), followed by 2 RGB triplets (0,0,0), for the 2 lines of your image.

    "Without using any libraries" you need 3 separate routines to:

    1. read and parse the PNG superstructure; this provides the IDAT compressed data, as well as essential information such as width, height, and color depth;
    2. decompress the zlib part(s) into raw binary data;
    3. parse the decompressed data, handling Adam-7 interlacing if required, and applying row filters.

    Only after performing these three steps you will have access to the raw image data. Of these, you seem to have a good grasp of step (1). Step (2) is way harder to "do" yourself; personally, I cheated and used miniz in my own PNG handling programs. Step 3, again, is merely a question of determination. All the necessary bits of information can be found on the web, but it takes a while to put everything in the right order. (Just recently I found an error in my execution of the rarely used Paeth row filter--it went unnoticed because it is fairly rarely used in 'real world' images.)

    See Building a fast PNG encoder issues for a similar discussion and Trying to understand zlib/deflate in PNG files for an in-depth look into the Deflate scheme.