Search code examples
javastringzip

Get file in folder in zip as string in java


I can get a text file as String with new String(Files.readAllBytes(Paths.get(path)), StandardCharsets.UTF_8). How do I achieve the same result if the file is in a folder which is in a zip file? I know I can get the zip as a ZipFile and the folder as a ZipEntry but I'm not clear on how I get the file nor how I make a String out of it. I don't want to create any files or folders to get it.

EDIT: Per dpr's answer, here's what I used:

String fileAsString;
try (ZipFile zip = new ZipFile(path)) {
    ZipEntry entry = zip.getEntry("folder/file.txt");
    if (entry == null) entry = zip.getEntry("folder\\file.txt");
    try (InputStream is = zip.getInputStream(entry)) {
        try (Scanner s = new Scanner(is, "UTF-8").useDelimiter("\\A")) {
            fileAsString = s.hasNext() ? s.next() : "";
        }
    }
}

Solution

  • Technically there is no such thing as directories inside a Zip-file. Everything in a Zip-file is basically an entry (ZipEntry in Java). One can use the isDirectory method to determine, if the current entry is representing a directory of the zipped file system structure or a regular file. The name attribute of a ZipEntry always reflects the full directory hierarchy of the originally zipped file relative to the archive's root. That is for a file Data\Folder1\example.txt you will have 3 ZipEntries in your zip file. One for Data, one Data\Folder1 and one Data\Folder1\example.txt.

    By simply iterating over the ZipEntries of your ZipFile and matching the path and file name of your desired file, you should easily find the desired entry. The contents of this entry can than be extracted using the already suggested ZipFile.getInputStream(ZipEntry) method.

    See this questions and the answers for examples on how to read an InputStream to string.

    Using Apache Commons-IO (IOUtils) for reading the InputStream to string this could look something like this:

    public String getFileContentsAsString(final File pZipFile, final String pFileName) throws Exception {
    
        try (ZipFile zipFile = new ZipFile(pZipFile)) {
            Enumeration<? extends ZipEntry> entries = zipFile.entries();
            while (entries.hasMoreElements()) {
                ZipEntry currentEntry = entries.nextElement();
                if (matchesDesiredFile(pFileName, currentEntry)) {
                    try (InputStream entryIn = zipFile.getInputStream(currentEntry)) {
                        String text = IOUtils.toString(entryIn, Charsets.UTF_8);
                        return text;
                    }
                }
            }
        }
    
        return null;
    }
    
    private boolean matchesDesiredFile(final String pFileName, final ZipEntry pZipEntry) {
        return !pZipEntry.isDirectory() && pZipEntry.getName().equals(pFileName);
    }
    

    If you're simply matching against the name attribute of the entry, you could of course as well use

    ZipEntry zipEntry = zipFile.getEntry(filePathWithinZipArchive);
    

    To get the desired entry instead of iterating over the entries "manually".

    Note that you should be carefull about the separator character used for directories. As pointed out here, it's up to the application that creates the zip file to either use \ (backslash) or / (forward slash) as directory separator character. I tried this on a Mac using the zip terminal command and both the ZipEntry's name an the original file name were Data/Folder1/example.txt. If you create the zip using a different tool the name of the ZipEntry might be Data\Folder1\example.txt. Even mixed variants (one ZipEntry using forward- and anotherone using backward slashes) are possible. You may want to consider this, if you have no control over the zip creation process.