Search code examples
javazipinputstream

Java ZipInputStream skipping unused ZipEntry content, rather than draining it


I'm trying to achieve an optimal reading of a ZipEntry content from zip. To achieve such I need the standard ZipInputStream to use InputStream.skip for not needed entry content rather than draining it.

As long as I understand from ZIP (file format) wiki:

Because the files in a ZIP archive are compressed individually it is possible to extract them, or add new ones, without applying compression or decompression to the entire archive. This contrasts with the format of compressed tar files, for which such random-access processing is not easily possible.

From this I assume that skipping not needed content is deterministic before uncompressing the entry's content using ZIP.

I however see that both ZipInputStream(Java standard) and ZipArchiveInputStream(apache) are draining the stream until the next entry rather than skipping it, which makes my use of it super inefficient.

I'm not completely aware of ZIP specification and seeing such a behavior of two majorly used ZIP APIs makes me think that it might be impossible.

Is it my understanding incorrect and such optimal behavior is not possible or what Java API do you suggest for reading Zip entries efficiently?


Solution

  • The problem here is that ZipInputStream is a stream. You start by reading the LOC (local file header) for the first entry, read the entry (decompress, checksum, etc.), repeat until no more entries (or LOCs rather).

    The end of the file/stream contains the directory for the whole zip contents for random access (or displaying zip file structure). When streaming data, you can't access the end of the stream. So even if you could seek, you wouldn't know where to seek to. You have to decompress to know when the data for the entry ends, then you get the LOC for the next entry and so on.

    In this duplicate it's said that the only source of truth is the central directory, so we can't rely on compressed size of an entry for skipping anyway.