Search code examples
javazip

Exception from stream obtained with ZipEntry.getInputStream(entry)


I'm trying to read some xml files from a zip file using java.util.zip.ZipFile, I was hoping to get an input stream which I could then parse with a sax parser but keep getting Sax Exceptions due to faulty prologs. Meaning that I'm not getting what I expect out of the inputStream.

What am I missing?

if (path.endsWith(".zip")){
            ZipFile file = new ZipFile(path);
            Enumeration<? extends ZipEntry> entries = file.entries();
            while (entries.hasMoreElements()){
                methodThatHandlesXmlInputStream(file.getInputStream(entries.nextElement()));
            }
        }
void methodThatHandlesXmlInputStream(InputStream input){
     doSomethingToTheInput(input);
     tryToParseXMLFromInput(input); //This is where the exception was thrown
}

Revisited Solution: The problem was that the method that handled the InputStream consumed it and attempted to read from it again. I've learned that it is better to generate separate InputStreams from zip files and handle each separately.

 ZipFile zipFile = new ZipFile(path);
 Enumeration<? extends ZipEntry> entries = file.entries();
    while (entries.hasMoreElements()){
        ZipEntry entry = entries.nextElement();
        methodConsumingInput( zipFile.getInputStream(entry) );
        anotherMethodConsumingSameInput( zipFile.getInputStream(entry) );

Solution

  • My guess is that getInputStream() returns a stream to the compressed xml file which would be unreadable.

    If you are reading an entry that has been compressed by ZIP, that should not happen. The ZipFile classes will take care of the uncompression.

    If the compression was done by something else before adding the entry to the ZIP file, then ZipFile won't be aware that it is compressed. You will need to:

    1. Figure out what compression scheme was used.
    2. Uncompress the stream yourself before you attempt to parse it. For example, wrap the result of getInputStream() with a DeflaterInputStream or similar.

    A third possibility is that the stream is not well-formed XML ... or not XML at all.


    Suggestion: Use a ZIP tool to extract the offending ZIP entry to a local file in the file system, then use a utility like the UNIX / Linux file command to figure out what the real file type is. (Don't trust the file suffix. It might be misleading you.)