Search code examples
zip

Random access to chunks of a file inside a ZIP


I've been experimenting with the ZIP format, specifically random access to contents inside it.

I know that ZIP supports random access, but AFAIK that is only to entires files inside the ZIP archive.

I was wondering if it was possible to load only a chunk of a file inside a ZIP file, without loading the entire subfile into memory

Note: I am working only with non-compressed zip files


Solution

  • If you are running on Windows or a POSIX-compatible system (like Linux), you can use memory-mapped files. Using this solution, the ZIP file will be mapped to virtual memory so that you can iterate through its content without loading and parsing the whole file in memory. You can find more information here and there. Most modern operating systems implement this nowadays.

    While memory-mapped files is great as it can be integrated with many existing tools, you can read the file yourself using low-level seek & reads. Since files are not compressed, you can:

    • first, read the zip header (at the end of the zip) to locate the location of a target file;
    • then, read the file header to get the size of the file and check if it is actually not compressed
    • finally, retrieve the target data chunk relative to the offset of the target file data (starting just after the file header).

    Plain files in the zip data format are written contiguously and can be retrieved so safely.

    You can find more information about the ZIP file format here.