Search code examples
pythonzipcompressiongzippkzip

How to decompress raw PKZIP data without zip header in python?


I want to to decompress raw data from a file in an exotic format, but I know that the compression method is the same that is used in a ZIP file (PKZIP).

In the file the PK\03\04 signature is missing. After that the data more or less fits the PKZIP header specs:

https://docs.fileformat.com/compression/zip/

  1. 2 bytes - version = 0x0014 (I don't know if it's meaningful)
  2. 2 byte flags = 0
  3. 2 bytes compression method = 0x0008 ("deflated" according to ZIP docs)
  4. random 4 bytes (modification times)
  5. random 4 bytes (should be the CRC32)
  6. 4 bytes of valid compressed size
  7. 4 bytes of valid uncompressed size
  8. file name length = 0x14
  9. extra field length = 0
  10. file name - 20 random bytes

Then the raw compressed data, and after that the End Record that looks damaged in a similar way. After adding the signature and valid file name characters and saving the buffer to a file, I was able to decompress it with 7zip. It showed an error dialog, but produced an uncompressed file. The resulting file contained the expected data.

I know that there is always one compressed file and the compression method is fixed. The file name is not important, so I guess it should be possible to process only the compressed data bytes after the header, ignoring the End Record as well.

Which Python package provides such functionality?

I want to ignore the ZIP headers and pass only the compressed data buffer to some function in Python (possibly specifying the compression method and some flags), and get the uncompressed data buffer back. No CRC check, no file names.


Solution

  • If the compression method is 8, then you can use Python's zlib module, passing wbits=-15 to either zlib.decompress() or zlib.decompressobj().