CRC-32 algorithm from HDL to software

I implemented a Galois Linear-Feedback Shift-Regiser in Verilog (and also in MATLAB, mainly to emulate the HDL design). It's been working great, and as of know I use MATLAB to calculate CRC-32 fields, and then include them in my HDL simulations to verify a data packet has arrived correctly (padding data with CRC-32), which produces good results.

The thing is I want to be able to calculate the CRC-32 I've implemented in software, because I'll be using a Raspberry Pi to input data through GPIO in my FPGA, and I haven't been able to do so. I've tried this online calculator, using the same parameters, but never get to yield the same result.

This is the MATLAB code I use to calculate my CRC-32:

N = 74*16;
data = [round(rand(1,N)) zeros(1,32)];
lfsr = ones(1,32);
next_lfsr = zeros(1,32);

for i = 1:length(data)
    next_lfsr(1) = lfsr(2);
    next_lfsr(2) = lfsr(3);
    next_lfsr(3) = lfsr(4);
    next_lfsr(4) = lfsr(5);
    next_lfsr(5) = lfsr(6);
    next_lfsr(6) = xor(lfsr(7),lfsr(1));
    next_lfsr(7) = lfsr(8);
    next_lfsr(8) = lfsr(9);
    next_lfsr(9) = xor(lfsr(10),lfsr(1));
    next_lfsr(10) = xor(lfsr(11),lfsr(1));
    next_lfsr(11) = lfsr(12);
    next_lfsr(12) = lfsr(13);
    next_lfsr(13) = lfsr(14);
    next_lfsr(14) = lfsr(15);
    next_lfsr(15) = lfsr(16);
    next_lfsr(16) = xor(lfsr(17), lfsr(1));
    next_lfsr(17) = lfsr(18);
    next_lfsr(18) = lfsr(19);
    next_lfsr(19) = lfsr(20);
    next_lfsr(20) = xor(lfsr(21),lfsr(1));
    next_lfsr(21) = xor(lfsr(22),lfsr(1));
    next_lfsr(22) = xor(lfsr(23),lfsr(1));
    next_lfsr(23) = lfsr(24);
    next_lfsr(24) = xor(lfsr(25), lfsr(1));
    next_lfsr(25) = xor(lfsr(26), lfsr(1));
    next_lfsr(26) = lfsr(27);
    next_lfsr(27) = xor(lfsr(28), lfsr(1));
    next_lfsr(28) = xor(lfsr(29), lfsr(1));
    next_lfsr(29) = lfsr(30);
    next_lfsr(30) = xor(lfsr(31), lfsr(1));
    next_lfsr(31) = xor(lfsr(32), lfsr(1));
    next_lfsr(32) = xor(data2(i), lfsr(1));

    lfsr = next_lfsr;
end

crc32 = lfsr;

See I use a 32-zeroes padding to calculate the CRC-32 in the first place (whatever's left in the LFSR at the end is my CRC-32, and if I do the same replacing the zeroes with this CRC-32, my LFSR becomes empty at the end too, which means the verification passed).

The polynomial I'm using is the standard for CRC-32: 04C11DB7. See also that the order seems to be reversed, but that's just because it's mirrored to have the input in the MSB. The results of using this representation and a mirrored one are the same when the input is the same, only the result will be also mirrored.

Any ideas would be of great help.

Thanks in advance

Solution

Your CRC is not a CRC. The last 32 bits fed in don't actually participate in the calculation, other than being exclusive-or'ed into the result. That is, if you replace the last 32 bits of data with zeros, do your calculation, and then exclusive-or the last 32 bits of data with the resulting "crc32", then you will get the same result.

So you will never get it to match another CRC calculation, since it isn't a CRC.

This code in C replicates your function, where the data bits come from the series of n bytes at p, least significant bit first, and the result is a 32-bit value:

unsigned long notacrc(void const *p, unsigned n) {
    unsigned char const *dat = p;
    unsigned long reg = 0xffffffff;
    while (n) {
        for (unsigned k = 0; k < 8; k++)
            reg = reg & 1 ? (reg >> 1) ^ 0xedb88320 : reg >> 1;
        reg ^= (unsigned long)*dat++ << 24;
        n--;
    }
    return reg;
}

You can immediately see that the last byte of data is simply exclusive-or'ed with the final register value. Less obvious is that the last four bytes are just exclusive-or'ed. This exactly equivalent version makes that evident:

unsigned long notacrc_xor(void const *p, unsigned n) {
    unsigned char const *dat = p;
    // initial register values
    unsigned long const init[] = {
        0xffffffff, 0x2dfd1072, 0xbe26ed00, 0x00be26ed, 0xdebb20e3};
    unsigned xor = n > 3 ? 4 : n;       // number of bytes merely xor'ed
    unsigned long reg = init[xor];
    while (n > xor) {
        reg ^= *dat++;
        for (unsigned k = 0; k < 8; k++)
            reg = reg & 1 ? (reg >> 1) ^ 0xedb88320 : reg >> 1;
        n--;
    }
    switch (n) {
        case 4:
            reg ^= *dat++;
        case 3:
            reg ^= (unsigned long)*dat++ << 8;
        case 2:
            reg ^= (unsigned long)*dat++ << 16;
        case 1:
            reg ^= (unsigned long)*dat++ << 24;
    }
    return reg;
}

There you can see that the last four bytes of the message, or all of the message if it is three or fewer bytes, is exclusive-or'ed with the final register value at the end.

An actual CRC must use all of the input data bits in determining when to exclusive-or the polynomial with the register. The inner part of that last function is what a CRC implementation looks like (though more efficient versions make use of pre-computed tables to process a byte or more at a time). Here is a function that computes an actual CRC:

unsigned long crc32_jam(void const *p, unsigned n) {
    unsigned char const *dat = p;
    unsigned long reg = 0xffffffff;
    while (n) {
        reg ^= *dat++;
        for (unsigned k = 0; k < 8; k++)
            reg = reg & 1 ? (reg >> 1) ^ 0xedb88320 : reg >> 1;
        n--;
    }
    return reg;
}

That one is called crc32_jam because it implements a particular CRC called "JAMCRC". That CRC is the closest to what you attempted to implement.

If you want to use a real CRC, you will need to update your Verilog implementation.