python3 and numba, wrong calculation of adler32 checksum

I have written the same algorithm in 3 flavours:

plain Python
Python optimized with Numba
Micropython optimized with Viper

I am sure about the correctness of the routine in plain Python and Micropython: the checksum on some megabytes of random data is equal the checksum computed by zlib.adler32

This is the routine written for Numba:

@numba.jit(nopython = True)
def adler_32(data: bytes, length: np.int32, init: np.uint32) -> np.uint32 :
# note: on the first call, init should be 1
    ADLER = np.uint16(65521)
    NMAX = np.int32(5552)                               # max iterations to keep s1, s2 as uint16

    s1 = np.uint16(init & 0xffff)
    s2 = np.uint16(init >> 16)
    i = 0
    while length > 0 :
        k = min(length, NMAX)

        length -= k
        while k :
            s1 += np.uint8(data[i])
            k -= 1
            s2 += s1
            i += 1

        if s1 >= ADLER :
            s1 -= ADLER
        if s2 >= ADLER :
            s2 -= ADLER

    return  (np.uint32(s2) << 16) | s1

Problem: s1 and s2 assumes values greater then 0xffff, so the results are incorrect.

Why s1 and s2, declared as uint16, get bigger values? Are there any caveats or side effects I don't see?

ERRATA CORRIGE

This is obviously wrong, in a hurry I copy & paste it from a microcontroller routine where the data are small and s1, s2 stay in uint16.

        if s1 >= ADLER :
            s1 -= ADLER
        if s2 >= ADLER :
            s2 -= ADLER

The correct, general algorithm, is

    s1 %= ADLER
    s2 %= ADLER

Solution

You need to understand the magic number in there: 5552. One more than that, 5553, is the number of iterations in the worst case that will overflow s2 if it is a 32-bit unsiged integer. Both s1 and s2 need to be declared as 32 bits. Not 16 bits.

Worst case is s1 and s2 starting at their highest values, 65520, and all of the bytes being 255. Then s2 becomes (1 + n) (65520 + 255 n / 2), where n is the number of bytes, all with value 255. Setting that equal to 2³² and solving for n, you get 5552.19.

Another error is that if sx > ADLER: sx -= ADLER falls well short of assuring that sx is less than ADLER, which it needs to be at that point in the loop. Again, for 5552 to have any meaning. Those both need to be sx %= ADLER.