Search code examples
c++czlib

What could cause invalid distance too far back and how to modify zlib to fix it?


I am trying to decompress a raw stream of data from 3rd party source. The data is compressed with zlib library (version 1.2.13) and transmitted over TCP protocol. I was able to capture both compressed and uncompressed stream of data using WireShark and mix of reverse engineering methods:

Compressed form: 0xCA 0x05 0xDB 0xC8 0xE8 0x07 0x22 0x01 0x00

Uncompressed form: 0x6D 0x4D 0x7D 0x9B 0x7C 0x07 0x01 0x4E 0x7D 0x9B 0x7C 0x07 0x00

    z_stream strm;
    unsigned char in[9] = {0xCA, 0x05, 0xDB, 0xC8, 0xE8, 0x07, 0x22, 0x01, 0x00};
    unsigned char out[65535] = {0};

    strm.zalloc = Z_NULL;
    strm.zfree = Z_NULL;
    strm.opaque = Z_NULL;

    int ret = inflateInit2(&strm, -15);
    if (ret != Z_OK)
        return ret;

    strm.next_in = (unsigned char *) in;
    strm.avail_in = 9;
    strm.next_out = (unsigned char *) out;
    strm.avail_out = 65535;
    strm.total_out = 0;

    ret = inflate(&strm, Z_SYNC_FLUSH);

Initially, the inflate function returned -3 (with message "invalid distance too far back"). I then recompiled zlib with two modifications: added DINFLATE_ALLOW_INVALID_DISTANCE_TOOFAR_ARRR flag and changed sane = 0 in inflateResetKeep function to allow invalid distance.

After these modifications, calling inflate function I am getting the following result (in output buffer):

0x6D 0x00 0x00 0x00 0x00 0x00 0x01 0x4E 0x00 0x00 0x00 0x00 0x00

I've tried to debug this deflate stream with infgen tool but it gives me an error saying incomplete deflate stream.

There is a chance that the deflate stream comes from modified zlib library (but I am not sure about it). Could anyone point me in right direction, please?


Solution

  • Your data is a fragment of a deflate stream. Those bits disassembled are:

    ! infgen 3.0 output
    !
    fixed
    literal 'm
    match 5 114
    infgen warning: distance too far back (114/1)
    literal 1 'N
    match 4 6
    end
    !
    stored
    infgen warning: incomplete deflate data
    

    It needs five bytes from 113 bytes before the start of this data, which would be part of an earlier portion of the deflate stream. Later, four of those five bytes that you don't have get repeated again.

    It is not possible to get the data you are expecting just from that fragment of a deflate stream. The 0x4D 0x7D 0x9B 0x7C 0x07 you are expecting has to come from somewhere else.