Attempting to understand different CRC implementations

Taken from IEEE 802.3,

Mathematically, the CRC value corresponding to a given MAC frame is defined by the following procedure:

a) The first 32 bits of the frame are complemented.
b) The n bits of the protected fields are then considered to be the coefficients of a polynomial M(x) of degree n – 1. (The first bit of the Destination Address field corresponds to the x(n–1) term and the last bit of the MAC Client Data field (or Pad field if present) corresponds to the x0 term.)
c) M(x) is multiplied by x32 and divided by G(x), producing a remainder R(x) of degree ≤ 31.
d) The coefficients of R(x) are considered to be a 32-bit sequence.
e) The bit sequence is complemented and the result is the CRC.

https://www.kernel.org/doc/Documentation/crc32.txt

A big-endian CRC written this way would be coded like:

for (i = 0; i < input_bits; i++) {
    multiple = remainder & 0x80000000 ? CRCPOLY : 0;
    remainder = (remainder << 1 | next_input_bit()) ^ multiple;
}

Where is part c) M(x) is multiplied by x^32? I don't see 32 zeros appended to any number.

Also the following piece of code make no sense to me. The code and math don't really match up.

Evaluating the differences in CRC-32 implementations and

unsigned short
crc16_update(unsigned short crc, unsigned char nextByte)
{
    crc ^= nextByte;

    for (int i = 0; i < 8; ++i) {
        if (crc & 1)
            crc = (crc >> 1) ^ 0xA001;
        else
            crc = (crc >> 1);
    }

    return crc;
}

What are these implementations doing? None of them really resemble the original procedure.

Even after reading the very end of this it still makes no sense: http://www.relisoft.com/science/crcmath.html

Solution

This tutorial (also here, here, and here for those who will complain about link rot), in particular "10. A Slightly Mangled Table-Driven Implementation", explains well the optimization to avoid feeding an extra 32 zero bits at the end.

The bottom line is that you feed the bits into the end of the register instead of the start, which has the same effect as feeding a register-length's worth of zeros at the end.

The tutorial also shows nicely how the implementation you quoted implements the long division over GF(2).