Search code examples
zipbit-manipulationcrc32

What is 'magic number' (0xdebb20e3) in pkzip's APPNOTE?


In section 4.4.7 (CRC-32) of APPNOTE.txt you can find

The 'magic number' for the CRC is 0xdebb20e3.

However, the regular polynomial (0x04C11DB7) of CRC-32 works fine. Bit-reflected, it is 0xEDB88320. I can't figure out what 0xdebb20e3 is.

I've tried to reflect and inverse bits of 0xdebb20e3, converting LE → BE and vice versa, did not get 0x04c11db7. 0xc704dd7b (rev), 0x38fb2284 (inv+rev), 0x2144df1c (inv); not even similar.


Solution

  • That is the complement of the CRC-32 of the original message, any original message, with its CRC-32 appended in little-endian order.

    So if I take the nine-byte message consisting of the ASCII digits "123456789", I get 0xcbf43926. If I then take the 13-byte message "123456789" followed by the four bytes 0x26 0x39 0xf4 0xcb, the CRC in little-endian order, I get 0x2144df1c. The complement of that is 0xdebb20e3.

    This is a constant for any given CRC definition, and is called the "residue". You can find it in the description of that CRC here (scroll right to the end of the line):

    width=32 poly=0x04c11db7 init=0xffffffff refin=true refout=true xorout=0xffffffff check=0xcbf43926 residue=0xdebb20e3 name="CRC-32/ISO-HDLC"
    

    The residue is the final contents of the CRC register before the final exclusive-or, which for this CRC is 0xffffffff, hence the complement. This CRC is appended to the message in little-endian order because it is reflected (refin=true and refout=true).

    It is a mathematical property of CRCs that that is a constant for all messages. It can be used to check the validity of a message, and is often used in hardware implementations of CRCs, where the CRC is appended to the message in the proper order, and you just keep running the CRC engine through the appended CRC and check to see if the contents of the register is the residue. If it is not, then there is an error in the message and/or the appended CRC.

    This is less useful in software implementations. There it is easier and faster to simply compare the CRC in the frame against the computed CRC of the message. If they are not equal, then there is an error in the message and/or the transmitted CRC. That is faster, since you don't need to compute the CRC on another four bytes, and you would have had to do a compare anyway. Also that is the way you would check the message integrity for any hash, where in general other hashes do not have this sort of mathematical property.