Search code examples
zipgzipdeflate

Extract gzip files from zip archive


We are creating a web service where you can upload a zip file (sometimes pretty large 100MB to 1GB) whose contents will then be served via http.

Contents are served with static gzip compression. As I understand gzip is essentially some headers + deflate. Zip is also some meta-info + multiple optionally compressed streams some of which usually are also deflate.

I am concerned that we are doing unnecessary round trip there. Unpack zip - then compress every file with gzip. In theory we could just slice zip in deflate chunks, add some headers and voila we have .gzip compressed files without doing actual compression, yet it sounds like something that someone else already did. So my question is:

Is there some command line tool for Linux or library for Ruby/Node.js/C++ that given a .zip file will create a folder with it's contents along with .gzipped versions of those contents without doing unnecessary recompression?


Solution

  • With the disclaimer that I have not reviewed or tested it, zip2gz is a Python project published on github to extract the compressed data blobs from a ZIP file without uncompressing. In particular, for files stored with "deflate" compression "it will take that raw deflate data and slap a gzip header and footer around it".

    Translating the code to another language should be straightforward, except maybe for the import zipfile dependency which would have to be remapped to the zip library/support in the target language (though the only part actually used is about the ZIP headers and central directory, not any un/compression).

    For an example in C that does the reverse conversion (from gzip to a single-entry zip file) without re/compressing and without any external libraries see Mark Adler's answer to Add .gz file to .zip archive without decompressing and re-compressing?.