Search code examples
pythoncportingcrc32

CRC computation port from C to Python


I need to convert the following CRC computation algorithm to Python:

#include <stdio.h>

unsigned int Crc32Table[256];

unsigned int crc32jam(const unsigned char *Block, unsigned int uSize)
{
    unsigned int x = -1; //initial value
    unsigned int c = 0;

    while (c < uSize)
    {
        x = ((x >> 8) ^ Crc32Table[((x ^ Block[c]) & 255)]);
        c++;
    }
    return x;
}

void crc32tab()
{
    unsigned int x, c, b;
    c = 0;

    while (c <= 255)
    {
        x = c;
        b = 0;
        while (b <= 7)
        {
            if ((x & 1) != 0)
                x = ((x >> 1) ^ 0xEDB88320); //polynomial
            else
                x = (x >> 1);
            b++;
        }
        Crc32Table[c] = x;
        c++;
    }
}

int main() {
    unsigned char buff[] = "whatever buffer content";
    unsigned int l = sizeof(buff) -1;
    unsigned int hash;

    crc32tab();
    hash = crc32jam(buff, l);
    printf("%d\n", hash);
}

two (failed) attempts to rewrite this in python follow:

def crc32_1(buf):
    crc = 0xffffffff
    for b in buf:
        crc ^= b
        for _ in range(8):
            crc = (crc >> 1) ^ 0xedb88320 if crc & 1 else crc >> 1
    return crc ^ 0xffffffff


def crc32_2(block):
    table = [0] * 256
    for c in range(256):
        x = c
        b = 0
        for _ in range(8):
            if x & 1:
                x = ((x >> 1) ^ 0xEDB88320)
            else:
                x >>= 1
        table[c] = x
    x = -1
    for c in block:
        x = ((x >> 8) ^ table[((x ^ c) & 255)])
    return x & 0xffffffff


data = b'whatever buffer content'

print(crc32_1(data), crc32_2(data))

Using the three routines on thee exact same data yield three different results:

mcon@cinderella:~/Desktop/3xDAsav/DDDAedit$ ./test5 
2022541416
mcon@cinderella:~/Desktop/3xDAsav/DDDAedit$ python3 test5.py 
2272425879 2096952735

As said: C code is "Golden Standard", how do I fix this in Python?

Note: I know I can call C routines from Python, but I consider that as "last resort".


Solution

  • Instead of porting your own CRC32 implementation, you can use one from the Python standard library. For historic reasons, the standard library includes two identical1 CRC32 implementations:

    Both implementations match the behavior of your crc32_1 function:

    import binascii
    import zlib
    
    >>> print(binascii.crc32(b'whatever buffer content'))
    2272425879
    
    >>> print(zlib.crc32(b'whatever buffer content'))
    2272425879
    

    To get a result matching the C implementation from the question, you just need to apply a constant offset:

    >>> 0xffff_ffff - zlib.crc32(b'whatever buffer content')
    2022541416
    

    As a bonus, these CRC32 functions are implemented in efficient C code, and will be much faster than any equivalent pure-Python port.


    1Note that the zlib module is only available when CPython is compiled with zlib support (which is almost always true). In the off chance that you're using a CPythion build without zlib, you won't be able to use the zlib module. Instead, you can use the binascii implementation, which uses zlib when available and defaults to an "in-house" implementation when its not.