Search code examples
pythoncompressionreverse-engineeringzlib

Guess configuration to inflate zlib compressed data


I want to inflate a zlib compressed data. I've tried the following in python:

zlib.decompress(data)

-> it return the following error: zlib.error: Error -3 while decompressing data: incorrect data check

So I found a way to ignore data check:

def decompress_corrupted(data):
    d = zlib.decompressobj(zlib.MAX_WBITS | 32)
    f = BytesIO(data)
    result_str = b''
    buffer = f.read(1)
    try:
        while buffer:
            result_str += d.decompress(buffer)
            buffer = f.read(1)
    except zlib.error:
        pass
    return result_str

But the result produced is partially "corrupted": I get a .rtf content with few mistakes.

My question: since I know that the compression uses zlib algorithm, what are the configurations parts (or pre/post-processes) I could try to get the original document?

Context: the solution used to compress these files is no more edited and the editor has never answered our messages. We only possess a compiled viewer but need the exact algorithm to make a migration to alternative solution. We know these files are not corrupted since the current viewer is able to display them properly.

If it can help:

  • here is the head of the data: 789C 95 54 5D 6F D3 30 14 ...
  • and the tail: .. 79 AE C5 E2 17 82 5E 3F 85

Solution

  • There are no "configurations" needed. zlib's inflate will inflate any valid compressed zlib stream losslessly to the original content.

    Therefore, despite your attestation, your data is getting corrupted or deliberately modified somewhere along the way.