I need to decompress some zlib compressed files found within a game's save data. I have no access to the game's source. Each file begins with 0x789C
which tells me that they are indeed compressed with zlib. However, all calls to inflate on these files fail to decompress fully and return Z_DATA_ERROR
. Using zlib version 1.2.5, 1.2.8, and 1.2.11 with identical results.
Even though zlib is telling me the input data is corrupt, I'm confident that it is not since the game is able to decompress these files with no issues AND this is not isolated to a single data stream. I have hundreds of thousands of unique data streams compressed the same way and they all throw a Z_DATA_ERROR
somewhere in the middle of the decompression.
Furthermore, the partially decompressed data that IS successfully decompressed, is correct. The output is exactly as expected.
Also, about 10% of the time, zlib WILL decompress the entire file, however the result is not correct. Large chunks of the decompressed data contain the same byte repeated over and over, which tells me it was a false positive.
Here's my decompression code:
//QByteArray is a Qt wrapper for a char *
QByteArray Compression::DecompressData(QByteArray data)
{
QByteArray result;
int ret;
z_stream strm;
static const int CHUNK_SIZE = 1;//set to 1 just for debugging
char out[CHUNK_SIZE];
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;
strm.opaque = Z_NULL;
strm.avail_in = data.size();
strm.next_in = (Bytef*)(data.data());
ret = inflateInit2(&strm, -15);
if (ret != Z_OK)
{
qDebug() << "init error" << ret;
return QByteArray();
}
do
{
strm.avail_out = CHUNK_SIZE;
strm.next_out = (Bytef*)(out);
ret = inflate(&strm, Z_NO_FLUSH);
qDebug() << "debugging output: " << ret << QString::number(strm.total_in, 16);//This tells me which input byte caused the failure
Q_ASSERT(ret != Z_STREAM_ERROR);
switch (ret)
{
case Z_NEED_DICT:
ret = Z_DATA_ERROR;
case Z_DATA_ERROR:
case Z_MEM_ERROR:
(void)inflateEnd(&strm);
return result;
}
result.append(out, CHUNK_SIZE - strm.avail_out);
} while (strm.avail_out == 0);
inflateEnd(&strm);
return result;
}
Here is a pastebin of an example file's data compressed data with the 0x789C
and trailing CRC removed. I can supply literally endless example files. All of them have the same issue.
Running that data through the above function will decompress the beginning of the stream correctly, but fail on input byte 0x18C
. You can tell it decompressed correctly when the start of the file begins with 0x000B
and the decompressed data is longer than the input data.
I wish I knew more about deflate compression to solve this problem myself. My initial thoughts are that the game has decided to use a custom version of zlib or an extra parameter needs to be given to zlib in order to decompress it correctly. I've asked around and tried many things for days, and I really need someone with knowledge on the subject to weigh in here. Thanks for your time!
The provided data is indeed an invalid deflate stream, both with distances too far back, and eight bytes of junk after the deflate stream has ended. There is nothing apparent wrong with your code.
As you noted, at offset 396 there is the first of ten distances too far back. That's where inflate stops. At offset 3472, almost at the end, there is a stored block with a length that doesn't check against its complement.
For the distances too far, you could try setting a dictionary of 32K zero bytes using inflateSetDictionary()
right after inflateInit2()
. Then the decompression would proceed, filling in the given locations with zeros. That may or may not be what the game is doing. There is no obvious remedy for the stored-block error.
Indeed the game author's may be deliberately messing with you or anyone trying to decompress their internal data, by having modified zlib for their own use.