CRC calculation reduction

I have one math and programming related question about CRC calculations, to avoid recompute full CRC for a block when you must change only a small portion of it.

My problem is the following: I have a 1K block of 4 byte structures, each one representing a data field. The full 1K block has a CRC16 block at the end, computed over the full 1K. When I have to change only a 4 byte structure, I should recompute the CRC of the full block but I'm searching for a more efficient solution to this problem. Something where:

I take the full 1K block current CRC16
I compute something on the old 4 byte block
I "subtract" something obtained at step 2 from the full 1K CRC16
I compute something on the new 4 byte block
I "add" something obtained at step 4 to the result obtained at step 3

To summarize, I am thinking about something like this:

CRC(new-full) = [CRC(old-full) - CRC(block-old) + CRC(block-new)]

But I'm missing the math behind and what to do to obtain this result, considering also a "general formula".

Thanks in advance.

Solution

Take your initial 1024-byte block A and your new 1024-byte block B. Exclusive-or them to get block C. Since you only changed four bytes, C will be bunch of zeros, four bytes which are the exclusive-or of the previous and new four bytes, and a bunch more zeros.

Now compute the CRC-16 of block C, but without any pre or post-processing. We will call that CRC-16'. (I would need to see the specific CRC-16 you're using to see what that processing is, if anything.) Exclusive-or the CRC-16 of block A with the CRC-16' of block C, and you now have the CRC-16 of block B.

At first glance, this may not seem like much of a gain compared to just computing the CRC of block B. However there are tricks to rapidly computing the CRC of a bunch of zeros. First off, the zeros preceding the four bytes that were changed give a CRC-16' of zero, regardless of how many zeros there are. So you just start computing the CRC-16' with the exclusive-or of the previous and new four bytes.

Now you continue to compute the CRC-16' on the remaining n zeros after the changed bytes. Normally it takes O(n) time to compute a CRC on n bytes. However if you know that they are all zeros (or all some constant value), then it can be computed in O(log n) time. You can see an example of how this is done in zlib's crc32_combine() routine, and apply that to your CRC.

Given your CRC-16/DNP parameters, the zeros() routine below will apply the requested number of zero bytes to the CRC in O(log n) time.

// Return a(x) multiplied by b(x) modulo p(x), where p(x) is the CRC
// polynomial, reflected. For speed, this requires that a not be zero.
uint16_t multmodp(uint16_t a, uint16_t b) {
    uint16_t m = (uint16_t)1 << 15;
    uint16_t p = 0;
    for (;;) {
        if (a & m) {
            p ^= b;
            if ((a & (m - 1)) == 0)
                break;
        }
        m >>= 1;
        b = b & 1 ? (b >> 1) ^ 0xa6bc : b >> 1;
    }
    return p;
}

// Table of x^2^n modulo p(x).
uint16_t const x2n_table[] = {
    0x4000, 0x2000, 0x0800, 0x0080, 0xa6bc, 0x55a7, 0xfc4f, 0x1f78,
    0xa31f, 0x78c1, 0xbe76, 0xac8f, 0xb26b, 0x3370, 0xb090
};

// Return x^(n*2^k) modulo p(x).
uint16_t x2nmodp(size_t n, unsigned k) {
    k %= 15;
    uint16_t p = (uint16_t)1 << 15;
    for (;;) {
        if (n & 1)
            p = multmodp(x2n_table[k], p);
        n >>= 1;
        if (n == 0)
            break;
        if (++k == 15)
            k = 0;
    }
    return p;
}

// Apply n zero bytes to crc.
uint16_t zeros(uint16_t crc, size_t n) {
    return multmodp(x2nmodp(n, 3), crc);
}