I'm trying to read some xml files from a zip file using java.util.zip.ZipFile
, I was hoping to get an input stream which I could then parse with a sax parser but keep getting Sax Exceptions due to faulty prologs. Meaning that I'm not getting what I expect out of the inputStream.
What am I missing?
if (path.endsWith(".zip")){
ZipFile file = new ZipFile(path);
Enumeration<? extends ZipEntry> entries = file.entries();
while (entries.hasMoreElements()){
methodThatHandlesXmlInputStream(file.getInputStream(entries.nextElement()));
}
}
void methodThatHandlesXmlInputStream(InputStream input){
doSomethingToTheInput(input);
tryToParseXMLFromInput(input); //This is where the exception was thrown
}
Revisited Solution:
The problem was that the method that handled the InputStream
consumed it and attempted to read from it again. I've learned that it is better to generate separate InputStream
s from zip files and handle each separately.
ZipFile zipFile = new ZipFile(path);
Enumeration<? extends ZipEntry> entries = file.entries();
while (entries.hasMoreElements()){
ZipEntry entry = entries.nextElement();
methodConsumingInput( zipFile.getInputStream(entry) );
anotherMethodConsumingSameInput( zipFile.getInputStream(entry) );
My guess is that getInputStream() returns a stream to the compressed xml file which would be unreadable.
If you are reading an entry that has been compressed by ZIP, that should not happen. The ZipFile classes will take care of the uncompression.
If the compression was done by something else before adding the entry to the ZIP file, then ZipFile
won't be aware that it is compressed. You will need to:
getInputStream()
with a DeflaterInputStream
or similar. A third possibility is that the stream is not well-formed XML ... or not XML at all.
Suggestion: Use a ZIP tool to extract the offending ZIP entry to a local file in the file system, then use a utility like the UNIX / Linux file
command to figure out what the real file type is. (Don't trust the file suffix. It might be misleading you.)