CRC calculation: Polynomial division with bytewise message XOR-ing?

Trying to understand what is meant in this bit of pseudocode from "Code fragment 3" in the following Wikipedia page:

function crc(byte array string[1..len], int len) {
    remainderPolynomial := 0
    // A popular variant complements remainderPolynomial here; see § Preset to −1 below
    for i from 1 to len {
        remainderPolynomial := remainderPolynomial xor polynomialForm(string[i]) * xn−8
        for j from 1 to 8 {    // Assuming 8 bits per byte
            if coefficient of xn−1 of remainderPolynomial = 1 {
                remainderPolynomial := (remainderPolynomial * x) xor generatorPolynomial
            } else {
                remainderPolynomial := (remainderPolynomial * x)
            }
        }
    }
    // A popular variant complements remainderPolynomial here; see § Post-invert below
    return remainderPolynomial
}

The article states:

In software, it is convenient to note that while one may delay the xor of each bit until the very last moment, it is also possible to do it earlier. It is usually convenient to perform the xor a byte at a time, even in a bit-at-a-time implementation like this:

OK, that sounds great. I need to implement CRC-16-IBM on a microcontroller and would prefer not to use the lookup table method (despite performance disadvantages). It would be interesting to explore this possibility of byte-wise xor calculation, rather than bit-by-bite. Yet, I don't quite understand what they mean.

While the code will be written in assembler, I am using Python to explore ideas and understand all the details. Here's CRC-16 code I wrote that, as far as I can tell, returns correct values:

# Flip the bits of an integer
# It will only flip the lower <bits> bits
def flip(x:int, bits:int=16):
    result = 0
    for i in range(bits):  # Reflect reg
        result <<= 1
        temp = x & (0x0001 << i)
        if temp:
            result |= 0x0001
    return result

def test_flip():
    n = 0b1100_1000

    bits = 8
    print(f"\nin: {n:0{bits}b}")
    print(f"out:{flip(n, bits):0{bits}b}")

    bits = 16
    print(f"\nin: {n:0{bits}b}")
    print(f"out:{flip(n,bits):0{bits}b}")

# test_flip()


def crc16(data:bytearray,
            poly:int=0x8005,
            init:int=0x0000,
            ref_in:bool=True,
            ref_out:bool=True,
            xor_out:int=-0x0000,
            debug:bool=False,
            disable_poly_xor:bool=False):
    reg = init                        # Initial CRC value
    for byte in data:                 # For each byte..
        if not ref_in:                # Reflect the input if necessary
            byte = flip(byte, 8)

        for i in range(8):            # Process all 8 bits one at a time
            reg <<= 1                 # Shift register left by one bit

            reg |= byte & 0x01      # Shift data each bit into the register on at a time
                                    # LSB first, unless input is NOT reflected

            if not disable_poly_xor:
                if reg & 0x010000:    # If a 1 shifts out of the 16 bit register,
                    reg ^= poly       # xor with polynomial

            byte >>= 1                # Prepare next bit to shift into register

            if debug:
                reg_str = f"{reg:032b}"
                print(f"{reg_str[-17:-16]} {reg_str[-16:-12]} {reg_str[-12:-8]} {reg_str[-8:-4]} {reg_str[-4:]} < {byte:08b}")

        if debug: print()
        reg &= 0xFFFF                 # This isn't a 16 bit hardware register, get rid of the excess

    if ref_out:
        return flip(reg, 16) ^ xor_out
    else:
        return reg ^ xor_out


# Checking implementations
check_data = bytearray(b"123456789\x00\x00")
check_result_should_be = 0xBB3D

print(f"\ndata: {[hex(n) for n in check_data]}")
print(f"expected CRC: {check_result_should_be: 04x}")

print()

print(f"crc16: {crc16(check_data, ref_in=False, ref_out=False): 04x}")
print(f"crc16: {crc16(check_data, ref_in=False, ref_out=True): 04x}")
print(f"crc16: {crc16(check_data, ref_in=True,  ref_out=False): 04x}")
print(f"crc16: {crc16(check_data, ref_in=True,  ref_out=True): 04x}")

I followed this reference document in order to produce the implementation you see above:

https://zlib.net/crc_v3.txt

I have looked through many CRC implementations in a range of languages (including many here on SO). I have not seen a single one that appends zeros to the data, as the paper seems to indicate is necessary. My test data above is bytearray(b"123456789"), yet I padded it with 16 bits of zeros. Without that I do not get the right CRC output values.

Per that paper, the CRC-16-IBM version I am implementing should output 0xBB3D, which is exactly what I get when the ref_in and ref_out are set to True.

Which brings me to the Wikipedia bytewise xor pseudocode. Here's how I interpret the pseudocode:

def crc16_1(data:bytearray,
            poly:int=0x8005,
            init:int=0x0000,
            ref_in:bool=True,
            ref_out:bool=True,
            xor_out:int=-0x0000,
            debug:bool=False,
            disable_poly_xor:bool=False):
    reg = init                        # Initial CRC value
    for byte in data:                 # For each byte..
        if not ref_in:                # Reflect the input if necessary
            byte = flip(byte, 8)

        reg ^= byte

        for i in range(8):            # Process all 8 bits one at a time
            if not disable_poly_xor:
                if reg & 0x8000:    # If a 1 is going to shift out of the 16 bit register,
                    reg <<= 1
                    reg ^= poly     # xor with polynomial
                else:
                    reg <<= 1       # Shift register left by one bit

            if debug:
                reg_str = f"{reg:032b}"
                print(f"{reg_str[-17:-16]} {reg_str[-16:-12]} {reg_str[-12:-8]} {reg_str[-8:-4]} {reg_str[-4:]} < {byte:08b}")

        if debug: print()
        reg &= 0xFFFF                 # This isn't a 16 bit hardware register, get rid of the excess

    if ref_out:
        return flip(reg, 16) ^ xor_out
    else:
        return reg ^ xor_out

Adding this to my test:

# Checking implementations
check_data = bytearray(b"123456789\x00\x00")
check_result_should_be = 0xBB3D

print(f"\ndata: {[hex(n) for n in check_data]}")
print(f"expected CRC: {check_result_should_be: 04x}")

print()

print(f"crc16: {crc16(check_data, ref_in=False, ref_out=False): 04x}")
print(f"crc16: {crc16(check_data, ref_in=False, ref_out=True): 04x}")
print(f"crc16: {crc16(check_data, ref_in=True,  ref_out=False): 04x}")
print(f"crc16: {crc16(check_data, ref_in=True,  ref_out=True): 04x}")

print()

print(f"crc16_1: {crc16_1(check_data, ref_in=False, ref_out=False): 04x}")
print(f"crc16_1: {crc16_1(check_data, ref_in=False, ref_out=True): 04x}")
print(f"crc16_1: {crc16_1(check_data, ref_in=True,  ref_out=False): 04x}")
print(f"crc16_1: {crc16_1(check_data, ref_in=True,  ref_out=True): 04x}")

I get:

data: ['0x31', '0x32', '0x33', '0x34', '0x35', '0x36', '0x37', '0x38', '0x39', '0x0', '0x0']
expected CRC:  bb3d

crc16:  fee8
crc16:  177f
crc16:  bcdd
crc16:  bb3d

crc16_1:  5e8b
crc16_1:  d17a
crc16_1:  6a07
crc16_1:  e056

Which isn't correct.

I hope someone can help me understand how to get from their pseudocode to real code that can past the simple test for this or any other CRC configuration. The definition for the CRC algorithm I am trying to implement is (also from the linked paper):

Name   : "CRC-16"
Width  : 16
Poly   : 8005
Init   : 0000
RefIn  : True
RefOut : True
XorOut : 0000
Check  : BB3D

The check value at the bottom is what I am using to ensure my code is working correctly.

I also took a look at the MODBUS CRC implementation, per page 40 of this document:

https://www.modbus.org/docs/Modbus_over_serial_line_V1_02.pdf

Here's my implementation:

def crc16_2(data:bytearray,
            poly:int=0x8005,
            init:int=0x0000,
            ref_in:bool=True,
            ref_out:bool=True,
            xor_out:int=-0x0000,
            debug:bool=False,
            disable_poly_xor:bool=False):
    reg = init                        # Initial CRC value
    for byte in data:                 # For each byte..
        if not ref_in:                # Reflect the input if necessary
            byte = flip(byte, 8)

        reg ^= byte

        for i in range(8):            # Process all 8 bits one at a time
            if not disable_poly_xor:
                if reg & 0x0001:    # If a 1 is going to shift out of the 16 bit register,
                    reg >>= 1
                    reg ^= poly     # xor with polynomial
                else:
                    reg >>= 1       # Shift register left by one bit

            if debug:
                reg_str = f"{reg:032b}"
                print(f"{reg_str[-17:-16]} {reg_str[-16:-12]} {reg_str[-12:-8]} {reg_str[-8:-4]} {reg_str[-4:]} < {byte:08b}")

        if debug: print()
        reg &= 0xFFFF                 # This isn't a 16 bit hardware register, get rid of the excess

    if ref_out:
        return flip(reg, 16) ^ xor_out
    else:
        return reg ^ xor_out

Note that they flip the polynomial and shift right, rather than left. They do xor one byte at a time, which is interesting.

Once again, no configuration produces the expected CRC value:

data: ['0x31', '0x32', '0x33', '0x34', '0x35', '0x36', '0x37', '0x38', '0x39', '0x0', '0x0']
expected CRC:  bb3d

crc16:  fee8 < correct
crc16:  177f < correct
crc16:  bcdd < correct
crc16:  bb3d < correct

crc16_1:  5e8b
crc16_1:  d17a
crc16_1:  6a07
crc16_1:  e056

crc16_2:  295
crc16_2:  a940
crc16_2:  44ad
crc16_2:  b522

EDIT:

This is a really nice resource contributed by Mark Adler:

https://github.com/madler/crcany

Solution

I only have Python 2.7. This seems to work.

    def flip(x=0, bits=16):
        result = 0
        for i in range(bits):  # Reflect reg
            result <<= 1
            temp = x & (0x0001 << i)
            if temp:
                result |= 0x0001
        return result
    
    def crc16(bytearray,
                poly=0x8005,
                init=0x0000,
                ref_in=True,
                ref_out=True,
                xor_out=0x0000):
        reg = init
        for byte in data:
            if ref_in:
                byte = flip(byte, 8)
            reg ^= byte << 8
            for i in range(8):
                if reg & 0x08000:
                    reg = (reg << 1) ^ poly
                else:
                    reg = (reg << 1)
                reg &= 0xffff
        if ref_out:
            return flip(reg, 16) ^ xor_out
        else:
            return reg ^ xor_out
    
    data = [0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39]
    
    print('0x%04x' % crc16(data, 0x8005, 0x0000, True, True, 0x0000))

Since the goal here is a reflected CRC16, this is a simpler right shifting CRC function that eliminates the need to do to bit flips. This means the right most bit of data and the CRC register are the logical most significant bits. The polynomial is also reflected from 0x8005 to 0xa001.

def crc16r(bytearray,
            poly=0xa001,
            init=0x0000,
            xor_out=0x0000):
    reg = init
    for byte in data:
        reg ^= byte
        for i in range(8):
            if reg & 0x0001:
                reg = (reg >> 1) ^ poly
            else:
                reg = (reg >> 1)
    return reg ^ xor_out

data = [0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39]

print('0x%04x' % crc16r(data, 0xa001, 0x0000, 0x0000))