Search code examples
javazipcorruption

Why am I receiving the error "start of central directory not found" for copied zips


I’m recursively unarchiving zip files in memory, reading and injecting content into any found placeholders, then packaging them all back up again and creating an output file.

Here’s the method in question:

public void unpackZipFile(InputStream in, OutputStream out) throws IOException {
    ZipInputStream zin = new ZipInputStream(in);
    ByteArrayOutputStream bout = new ByteArrayOutputStream();
    ZipOutputStream zos = new ZipOutputStream(bout);
    for (ZipEntry entry = zin.getNextEntry(); entry != null; entry = zin.getNextEntry()) {
        if (entry.isDirectory() || entry.getName().startsWith("__MACOSX/")) continue;
        zos.putNextEntry(new ZipEntry(entry.getName()));
        processInputStream(zin, zos);
        zos.closeEntry();
    }
    zos.close();
    bout.writeTo(out);
}

Unfortunately, most zip unarchivers are complaining about the resulting file. An example of this would be:

warning [1-master.zip.data]:  1198 extra bytes at beginning or within zipfile
(attempting to process anyway)
error [1-master.zip.data]:  start of central directory not found;
zipfile corrupt.
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)

The unarchivers that aren’t complaining, however, produce exactly what is expected. None of the files seem corrupted, the contents are as expected, and they run as expected. The only files that seem impacted are the zips themselves, which all have this problem regardless of if they were the outermost zip or nested.

I’ve been attempting to uncover what the issue could be for the past several days without luck and figured a fresh set of eyes might help shed light on my ignorance.

Edit: The entire class: https://gist.github.com/justisr/0b127182fb143c06a1888f83a628995f


Solution

  • The class file provided by Justis gives us an inkling as to what's going wrong here.

    https://gist.github.com/justisr/0b127182fb143c06a1888f83a628995f

    public void processInputStream(String loc, InputStream in, OutputStream out) throws IOException {
        if (processLocation(loc)) {
            switch (parseFormat((PushbackInputStream) (in = new PushbackInputStream(in, 8)))) {
            case CLASS:
                processClassFile(loc, in, out);
                break;
            case EMPTY_OBJECT:
                copy(in, out);
                break;
            case OTHER:
                copyReplace(in, out, loc);
                break;
            case RAR:
                copy(in, out);
                break;
            case ZIP_OR_JAR:
                unpackZipFile(loc, in, out);
                break;
            case SCHEMATIC:
                break;
            case PNG:
                break;
            }
        }
        //<-- valid ZIP within output stream to this point
        copy(in, out);
    }
    

    With the specific ZIP file causing this issue, unneeded/arbitrary data is located at the end of the file, which the ZipInputStream knows to ignore and any call to .read() will return -1 bytes read.

    However, this outer call to copy() is between the base input and output stream, reading and writing this unneeded/arbitrary data anyway, and thus corrupting the ZIP file.

    Obvious fix knowing this is to use the default case and removing this outer call to copy() to ensure unneeded data isn't being written directly to the output stream.