Search code examples
javajvmbytecodeunsigned

How to handle the unsigned types (especially u4) of a Java class file in a Java program?


From the Java Virtual Machine specification:

A class file consists of a stream of 8-bit bytes. All 16-bit, 32-bit, and 64-bit quantities are constructed by reading in two, four, and eight consecutive 8-bit bytes, respectively. Multibyte data items are always stored in big-endian order, where the high bytes come first. In the Java platform, this format is supported by interfaces java.io.DataInput and java.io.DataOutput and classes such as java.io.DataInputStream and java.io.DataOutputStream.

This chapter defines its own set of data types representing class file data: The types u1, u2, and u4 represent an unsigned one-, two-, or four-byte quantity, respectively. In the Java platform, these types may be read by methods such as readUnsignedByte, readUnsignedShort, and readInt of the interface java.io.DataInput.

Aside from the irritating mentioning of "64-bit quantities" (there is no u8, long and double are splitted in two u4 items), I don't understand how to handle the u4 type.

For u1 and u2 it's clear:

  • u1: read with readUnsignedByte, store in an int
  • u2: read with readUnsignedShort, store in an int

The specification advises this:

  • u4: read with readInt, store in an int (?)

What happens to values greater than Integer.MAX_VALUE? Does this advice silently imply that all values of type u4 are less than or equal to Integer.MAX_VALUE?

I came up with this idea:

  • u4: read with readUnsignedInt, store in a long

Unfortunalety, there is no such method. But that's not the problem, since you can easily write your own:

public long readUnsignedInt() throws IOException {
    return readInt() & 0xFFFFFFFFL;
}

So, here are two questionable spots:

  1. The Code attribute:

Code_attribute {
...
u4 code_length;
u1 code[code_length];
...
}

Why is code_length not of type u2? Later it says:

The value of the code_length item must be less than 65536.

  1. The SourceDebugExtension attribute:

SourceDebugExtension_attribute {
...
u4 attribute_length;
u1 debug_extension[attribute_length];
}
...
Note that the debug_extension array may denote a string longer than that which can be represented with an instance of class String.

Why? Can u4 values indeed exceed Integer.MAX_VALUE (since I think this is the maximum length of a String instance)?


Solution

    1. To easily lift 64K code length restriction, if such a need arise.
    2. Since there is no mention that u4 values cannot exceed Integer.MAX_VALUE, then one must assume that u4 values can exceed Integer.MAX_VALUE. JVM spec lefts nothing implicit.