Purpose of REFIN and REFOUT parameters in CRC Rocksoft model

The Rocksoft parameter model for CRCs is well known. It contains two boolean parameters called REFIN and REFOUT. I understand what they are doing and I understand how to implement them.

There are a lot of information in the web about the Rocksoft model and those parameters. What I don't understand (and what I couldn't find) in the web is the purpose of those parameters. Even Ross Williams Painless guide seems not to explain the reason.

Is this about Endianess on memory architectures? Is it about transmission order of UARTS? Is it about implementations that "randomly" process data in the "wrong" order?

So, when implementing a (custom) CRC: When do I usually need to reflect input bytes? When do I usually need to reflect output bytes?

Solution

Usually you need to reflect neither in the implementation. For almost all CRC definitions, the refin and refout parameters are equal. Either both the input and output are reflected, or neither are. If both are reflected, then instead of reflecting the input and output, the CRC implementation is itself reflected. The polynomial is reflected and the shifts are down instead of up, which has the exact same effect as reflecting the input and output bits. However it takes zero CPU time, since the reflection is done at implementation time instead of run time.

There is, oddly, one CRC definition in Greg Cook's catalogue, CRC-12/UMTS, with refin=false refout=true. In that case, you would implement the CRC without reflection, and then reflect the resulting 12-bit CRC (which is still pretty quick).

The decision to make a CRC reflected or not is usually made at the time a hardware shift register is implemented, where the order of bits fed into the shift register would, for example, be the order in which the data is serialized. In some cases the least significant bit of the data goes out the wire first, in which case the CRC is reflected, or the most significant bit of data goes out first, in which case the CRC is not reflected.

The other time such a decision might be made is when implementing a CRC purely in software, not beholden to any hardware implementation. In that case, some decided to not reflect the CRC, since that most closely matches how the CRC might be defined in a textbook. Others decided to reflect the CRC, since then the number of instructions in the implementation may be reduced. Shifting down instead of up avoids a shift of the data or the CRC, for example. It can also avoid having to mask out extraneous bits shifted up in an unreflected implementation done in a larger machine word.

In any case, the error detection performance of the CRC is the same, whether reflected or not.

To illustrate the speed benefit, here is the operation done for each byte in a 32-bit unreflected bit-wise CRC calculation in software:

        crc ^= (uint32_t)data[i] << 24;
        for (unsigned k = 0; k < 8; k++)
            crc = crc & 0x80000000 ? (crc << 1) ^ 0x4c11db7 : crc << 1;

Whereas for a reflected CRC, it's:

        crc ^= data[i];
        for (unsigned k = 0; k < 8; k++)
            crc = crc & 1 ? (crc >> 1) ^ 0xedb88320 : crc >> 1;

(Those implement the same degree 32 polynomial, where the bits of the polynomial are reflected in the reflected implementation.)

The same thing for a byte-wise table lookup, first for unreflected:

        crc = (crc << 8) ^ table_byte[((crc >> 24) ^ data[i]) & 0xff];

and then reflected:

        crc = (crc >> 8) ^ table_byte[(crc ^ data[i]) & 0xff];

In both cases, one shift per input byte is avoided by the reflected implementation.