Search code examples
pythonzipzlibdeflatecrc32

ZIP's CRC-32 for encryption isn't quite zlib's crc32... why?


I'm writing my own unzip code, and (from trial and error, no understanding) it looks like the CRC-32 algorithm on the one byte that decryption requires doesn't quite match up with zlib's. To convert from one to the other:

def crc32(ch, crc):
   crc = zlib.crc32(bytes([~ch & 0xFF]), crc)
   return (~crc & 0xFF000000) + (crc & 0x00FFFFFF)

Why is this? (/ Am I wrong?)


Edit: the reason why I think there is at least the possibility of me being right, at https://github.com/uktrade/stream-unzip/blob/d23400028abbe3b0d7e1951cb562cd0541bfc960/stream_unzip.py#L89 I use the above successfully to decrypt encrypted ZIP files

def decrypt(chunks):
    key_0 = 305419896
    key_1 = 591751049
    key_2 = 878082192

    def crc32(ch, crc):
        crc = zlib.crc32(bytes([~ch & 0xFF]), crc)
        return (~crc & 0xFF000000) + (crc & 0x00FFFFFF)

    def update_keys(byte):
        nonlocal key_0, key_1, key_2
        key_0 = crc32(byte, key_0)
        key_1 = (key_1 + (key_0 & 0xFF)) & 0xFFFFFFFF
        key_1 = ((key_1 * 134775813) + 1) & 0xFFFFFFFF
        key_2 = crc32(key_1 >> 24, key_2)

    def decrypt(chunk):
        chunk = bytearray(chunk)
        for i, byte in enumerate(chunk):
            temp = key_2 | 2
            byte ^= ((temp * (temp ^ 1)) >> 8) & 0xFF
            update_keys(byte)
            chunk[i] = byte
        return chunk

    yield_all, _, get_num, _ = get_byte_readers(chunks)

    for byte in password:
        update_keys(byte)

    if decrypt(get_num(12))[11] != mod_time >> 8:
        raise ValueError('Incorrect password')

    for chunk in yield_all():
        yield decrypt(chunk)

However, if I replace the crc32 function above with just calling zlib's, it doesn't (e.g. it will complain about an incorrect password)


Solution

  • Ok, you're not completely wrong. It is indeed the same CRC-32 algorithm, but without the pre and post-processing (inverting the CRC coming in and going out). It is truly odd code that is trying to replicate that with the zlib.crc32 function. All you need is this:

    def crc32(ch, crc):
        return ~zlib.crc32(bytes([ch]), ~crc) & 0xffffffff