Search code examples
javaandroidzipcompressionskyepub

How to access a zipEntry from a streamed zip file in memory


I'm currently implementing an Ereader library (skyepub) that requires that I implement a method that checks if a zipEntry exists or not. In their demo version, the solution is simple:

public boolean isExists(String baseDirectory,String contentPath) {
    setupZipFile(baseDirectory,contentPath);
    if (this.isCustomFont(contentPath)) {
        String path = baseDirectory +"/"+ contentPath;
        File file = new File(path);
        return file.exists();
    }

    ZipEntry entry = this.getZipEntry(contentPath);
    if (entry==null) return false;
    else return true;       
}

// Entry name should start without / like META-INF/container.xml 

private ZipEntry getZipEntry(String contentPath) {

    if (zipFile==null) return null;

    String[] subDirs = contentPath.split(Pattern.quote(File.separator));

    String corePath = contentPath.replace(subDirs[1], "");

    corePath=corePath.replace("//", "");

    ZipEntry entry = zipFile.getEntry(corePath.replace(File.separatorChar, '/'));

    return entry;

}

So as you can see, you can access the ZipEntry in question in O(1) time using getZipEntry(contentPath);

However, in my case I cannot read the zipfile straight from the file system (it must be read from in memory for security reasons).. And so my ifExists implementation actually goes through the zip file one entry at a time, until it finds the zipEntry in question, here is the relevant part:

try {
        final InputStream stream = dbUtil.getBookStream(bookEditionID);
        if( stream == null) return null;

        final ZipInputStream zip = new ZipInputStream(stream);

        ZipEntry entry;
        do {
            entry = zip.getNextEntry();
            if( entry == null) {
                zip.close();
                return null;
            }
        } while( !entry.getName().equals(zipEntryName));

    } catch( IOException e) {
        Log.e("demo", "Can't get content data for "+contentPath);
        return null;
    }

    return data;

and so if data exists, ifExistsreturns true, otherwise false if null.

Question

Is there a way I can find the zip entry in question from the entire ZipInputStream in O(1) time rather than O(n) time?

Related

See this question and this answer.


Solution

  • An entry in a zip archive cannot really be loaded in O(1) time. If we look at the structure of a zip archive, it looks like this:

      [local file header 1]
      [encryption header 1]
      [file data 1]
      [data descriptor 1]
      ... 
      [local file header n]
      [encryption header n]
      [file data n]
      [data descriptor n]
      [archive decryption header] 
      [archive extra data record] 
      [central directory header 1]
      .
      [central directory header n]
      [zip64 end of central directory record]
      [zip64 end of central directory locator] 
      [end of central directory record]
    

    Basically, there are compressed files with some headers plus a "central directory" which contains all metadata about the files (central directory headers). The only valid way how to locate an entry is by scanning the central directory (more info):

    ...must not scan for entries from the top of the ZIP file, because only the central directory specifies where a file chunk starts

    Because there is no index over central directory headers, you can only get an entry in O(n) where n is the number of files in the archive.

    Update: Unfortunately, all zip libraries I know of which work with streams rather than files do use local file headers and scan the entire stream including contents. They cannot be easily bent either. The only way how to avoid scanning the entire archive I found is adapting a library yourself.

    Update 2: I have taken the liberty of modifying the aforementioned zip4j library for your purposes. Assuming you have your zip file read in a byte array and you have added a dependency on zip4j version 1.3.2, you can use MemoryHeaderReader and RandomByteStream like this:

    String myZipFile = "...";
    byte[] bytes = readFile();
    MemoryHeaderReader headerReader = new MemoryHeaderReader(RandomAccessStream.fromBytes(bytes));
    ZipModel zipModel = headerReader.readAllHeaders();
    FileHeader myFile = Zip4jUtil.getFileHeader(zipModel, myZipFile)
    boolean fileIsPresent = myFile != null;
    

    It works in O(entryCount) without reading the entire archive which should be reasonably fast. I haven't thoroughly tested it, but it should give you an idea how you can adjust zip4j for your purposes.