Search code examples
javacharacter-encodingzipinputstream

ZipInputStream(InputStream, Charset) decodes ZipEntry file name falsely


Java 7 is supposed to fix an old problem with unpacking zip archives with character sets other than UTF-8. This can be achieved by constructor ZipInputStream(InputStream, Charset). So far, so good. I can unpack a zip archive containing file names with umlauts in them when explicitly setting an ISO-8859-1 character set.

But here is the problem: When iterating over the stream using ZipInputStream.getNextEntry(), the entries have wrong special characters in their names. In my case the umlaut "ü" is replaced by a "?" character, which is obviously wrong. Does anybody know how to fix this? Obviously ZipEntry ignores the Charset of its underlying ZipInputStream. It looks like yet another zip-related JDK bug, but I might be doing something wrong as well.

...
zipStream = new ZipInputStream(
    new BufferedInputStream(new FileInputStream(archiveFile), BUFFER_SIZE),
    Charset.forName("ISO-8859-1")
);
while ((zipEntry = zipStream.getNextEntry()) != null) {
    // wrong name here, something like "M?nchen" instead of "München"
    System.out.println(zipEntry.getName());
    ...
}

Solution

  • I played around for two or so hours, but just five minutes after I finally posted the question here, I bumped into the answer: My zip file was not encoded with ISO-8859-1, but with Cp437. So the constructor call should be:

    zipStream = new ZipInputStream(
        new BufferedInputStream(new FileInputStream(archiveFile), BUFFER_SIZE),
        Charset.forName("Cp437")
    );
    

    Now it works like a charm.