Search code examples
gzipcompressiongzipstream

Decompress gzip file that contine multiple blocks


I have a Gzip file that has multiple blocks.Every block starts with

1F 8B 08 

And ends with

00 00 FF FF

I tried to decompress the file using 7-Zip and gzip tool in linux ,But I always get an error saying that the file is invalid. So I wrote this python script

import zlib
CHUNKSIZE=1

f=open("file.gz","rb")
buffer=f.read(CHUNKSIZE)

data=""
r=CHUNKSIZE
d = zlib.decompressobj(16+zlib.MAX_WBITS)
while buffer:
  outstr = d.decompress(buffer)
  print(r)
  buffer=f.read(CHUNKSIZE)
  r=r+CHUNKSIZE

outstr = d.flush()

I have notice that when it reach to the header of the second block

00 00 00 FF FF 1F 8B 08

at the point between FF and 1F the script return

zlib.error: Error -3 while decompressing data: invalid block type

I made the size of the chunk to be 1 so the I would know exactly where the problem is. I know that the problem is not in the file because I have multiple files constructed the same way and they show exactly the same error.


Solution

  • I know that the problem is not in the file because I have multiple files constructed the same way and they show exactly the same error.

    The conclusion is not that the problem is not in the file, but rather that the problem is in all of your files. Someone either inadvertently or deliberately constructed invalid gzip files. It looks like they did that by using Z_SYNC_FLUSH or Z_FULL_FLUSH instead of Z_FINISH to end each stream before starting another faux gzip stream. A gzip stream ends with a last block followed by an eight-byte gzip trailer containing two check values on the integrity of the uncompressed data.

    You can nevertheless continue with decompression, though without the comfort of any integrity checking of the data, by simply picking up with a new instance of decompressobj when you get an error and see a new gzip header, 1f 8b 08.

    More importantly you should locate and contact the source of these files and say "Hey, WTF?"