Search code examples
javazip

What are the extra bytes in the ZipEntry used for?


The Java library for Zip files has an option in ZipEntry for getExtra() that returns either byte[] or null. What are the extra bytes in the ZipEntry used for? I'm aware of this question about archive attributes linked to getExtra() but it doesn't explain what else the field is used for. Furthermore the question indicates that some things stored in the extra field cannot be set from Java.


Solution

  • The answer can be found in the first two links in the java.util.zip package documentation.

    The basic zip format is described in the PKWARE zip specification. Sections 4.5 and 4.6 describe what the extra data is.

    The extra data is a series of zero or more blocks. Each block starts with a little-endian 16-bit ID, followed by a little-endian 16-bit count of the bytes that immediately follow.

    The PKWARE specification describes some well known extra data record IDs. The Info-Zip format describes many more.

    So, if you wanted to check whether a zip entry includes an ASi Unix Extra Field, you might read it like this:

    ByteBuffer extraData = ByteBuffer.wrap(zipEntry.getExtra());
    extraData.order(ByteOrder.LITTLE_ENDIAN);
    
    while (extraData.hasRemaining()) {
        int id = extraData.getShort() & 0xffff;
        int length = extraData.getShort() & 0xffff;
    
        if (id == 0x756e) {
            int crc32 = extraData.getInt();
            short permissions = extraData.getShort();
            int linkLengthOrDeviceNumbers = extraData.getInt();
            int userID = extraData.getChar();
            int groupID = extraData.getChar();
    
            ByteBuffer linkDestBuffer = extraData.slice().limit(length - 14);
            String linkDestination =
                StandardCharsets.UTF_8.decode(linkDestBuffer).toString();
    
            // ...
        } else {
            extraData.position(extraData.position() + length);
        }
    }