Search code examples
pythonccrc

Combining two non-pure CRC values


Based on this answer I understand principle behind combining CRC values of two blocks, given the fact that CRC of string of zeros is 0 and using linearity property. But in practice most implementations seed the register with some initial value and complement the final result to deal with erroneous insertion of leading/trailing 0's. So, the basic implementation of 1) setting register with CRC of first block, 2)feeding 0's in length of 2nd block and 3)XOR'ing the result in CRC of 2nd block doesn't work.

It looks like you need to "undo" the post conditioning before step 3.

I tried this with the following python program that uses CRC implementation in zlib library.

import zlib

def crc_combine(crcA, crcB, lenB):
     crcA0 = zlib.crc32(b'\0' * len(b), crcA ^ 0xffffffff) ^ 0xffffffff
     return crcA0 ^ crcB

 if __name__ == "__main__":
      a, b, ab = b'Hello', b'World', b'HelloWorld'
      assert zlib.crc32(ab) == crc_combine(zlib.crc32(a), zlib.crc32(b), len(b))

The combine operation works, but I would like to understand why ?

The library seems to use 0xffffffff as the pre/post condition so I used that value for the two XORs. Without the two XOR's the combine function doesn't do the right thing and the assertion fails. As I understand, the first XOR cancels-out with initializing CRC register with 0xffffffff within zlib.crc32() and ends up initializing the CRC register with crcA (which includes both pre/post conditioning). It then starts feeding 0s, thereby shifting the CRC to the left. The second XOR with 0xffffffff is necessary to undo the complement of result done by zlib.crc32(). Eventually the the result crcA0 is XOR'd with crcB (which also includes both pre/post conditions applied to it) to get combined CRC value.

Are the two XOR's with 0xfffffffff required to make 'CRCA0 ^ crcB' work in pure form ? But it looks we are feeding 0s after the register it initialized with crcA, which is both pre and post conditioned. Is there any intuitive explanation to this ?


Solution

  • CRC32 is a reflected CRC that initializes CRC to 0xffffffff and post xors CRC with 0xffffffff. If zlib.crc32() is called with an input CRC parameter, it xor's it with 0xffffffff, in order to "undo" the post xor from a prior call to zlib.crc32().

    It looks like you need to "undo" the post conditioning before step 3.

    zlib.crc32() is already "undoing" the post xor when it xor's the input CRC with 0xffffffff, but in this case, the 0xffffffff is needed in order to "undo" the initial CRC of 0xffffffff that was used for crcB:

    crcB:                 ff ff ff ff 00 - crcB initial crc
                          W  o  r  l  d  - crcB data
    
    crcA:                 ff ff ff ff 00 - crcA initial crc
                          H  e  l  l  o  - crcA data
    
    crcA0:                ff ff ff ff 00 - initial CRC to undo crcB initial CRC
                          00 00 00 00 00 - crcA0 data
           ff ff ff ff 00                - crcA cycled 5 times
           H  e  l  l  o
    

    Rather than using combine, the code could be:

    crc = zlib.crc32(a)
    crc = zlib.crc32(b, crc)
    

    If zlib included exponentiation and multiply modulo the crc32 poly = 0x4C11DB7, an alternative to computing crc32(b'\0' * len(b), crc) would be:

    multiply((2^(8*len(b)) * crc);    // all modulo 0x4C11DB7