Search code examples
javaapacheapache-commonsapache-commons-compress

ZipArchiveInputStream created from InputStream, unable to read the content


Working Code :

InputStream is  = zipFile.getInputStream(zipArchiveEntry);
BufferedReader br = new BufferedReader(new InputStreamReader(zis));
StringBuilder sb = new StringBuilder();

String line;
while ((line = br.readLine()) != null) {
      System.out.println(line);
} 

Not Working Code

    InputStream is  = zipFile.getInputStream(zipArchiveEntry);

    ZipArchiveInputStream zis = new ZipArchiveInputStream(is);
    if(zis.canReadEntryData(zipArchiveEntry)) {
            // Start
        BufferedReader br = new BufferedReader(new InputStreamReader(zis));
        StringBuilder sb = new StringBuilder();
        String line;
        while ((line = br.readLine()) != null) {
                System.out.println(line);
        } 
}

Idea is rather than reading from InputStream, i try to create ZipArchiveInputStream from InputStream, so that I can utilize canReadEntryData() method. canReadEntryData() works completely fine. It returns true for normal files, but I am not able to read content from ZipArchiveInputStream .

Please help. Kindly point where am I going wrong.


Solution

  • ZipArchiveInputStream vs ZipFile

    It appears that ZipArchiveInputStream has some limitations as stated by the official documentation:

    ZIP archives store a archive entries in sequence and contain a registry of all entries at the very end of the archive. It is acceptable for an archive to contain several entries of the same name and have the registry (called the central directory) decide which entry is actually to be used (if any).

    In addition the ZIP format stores certain information only inside the central directory but not together with the entry itself, this is:

    • internal and external attributes
    • different or additional extra fields

    This means the ZIP format cannot really be parsed correctly while reading a non-seekable stream, which is what ZipArchiveInputStream is forced to do. As a result ZipArchiveInputStream

    • may return entries that are not part of the central directory at all and shouldn't be considered part of the archive.
    • may return several entries with the same name.
    • will not return internal or external attributes.
    • may return incomplete extra field data.

    ZipArchiveInputStream shares these limitations with java.util.zip.ZipInputStream.

    ZipFile is able to read the central directory first and provide correct and complete information on any ZIP archive.

    ZIP archives know a feature called the data descriptor which is a way to store an entry's length after the entry's data. This can only work reliably if the size information can be taken from the central directory or the data itself can signal it is complete, which is true for data that is compressed using the DEFLATED compression algorithm.

    ZipFile has access to the central directory and can extract entries using the data descriptor reliably. The same is true for ZipArchiveInputStream as long as the entry is DEFLATED. For STORED entries ZipArchiveInputStream can try to read ahead until it finds the next entry, but this approach is not safe and has to be enabled by a constructor argument explicitly.

    Conclusion:

    If possible, you should always prefer ZipFile over ZipArchiveInputStream.

    I believe, by ZipFile the above sentence means the use of InputStream created using a ZipFile:

    InputStream is = zipFile.getInputStream(zipArchiveEntry);