Search code examples
javajvmlanguage-lawyerjava-bytecode-asmjvm-bytecode

Merging int[] and String[] should result in Object[]


After carefully reading JVMS §4.10.2.2, I noticed the following paragraph:

If corresponding values are both array reference types, then their dimensions are examined. If the array types have the same dimensions, then the merged value is a reference to an instance of an array type which is first common supertype of both array types. (If either or both of the array types has a primitive element type, then Object is used as the element type instead.) [...]

[...] Even int[] and String[] can be merged; the result is Object[], because Object is used instead of int when computing the first common supertype.

int[] should not be a subtype of Object[], so this is interesting, let's test this:

package test.se17;

import org.objectweb.asm.ClassWriter;
import org.objectweb.asm.Label;
import org.objectweb.asm.MethodVisitor;
import static org.objectweb.asm.Opcodes.*;

import java.lang.invoke.MethodHandles;

public class CastArrayDump {
    
    public static byte[] dump() throws Exception {
        
        ClassWriter classWriter = new ClassWriter(0);
        MethodVisitor methodVisitor;
        
        classWriter.visit(V1_5, ACC_PUBLIC | ACC_SUPER, "test/se17/CastArray", null,
                "java/lang/Object", null);
        
        {
            methodVisitor = classWriter.visitMethod(ACC_PUBLIC | ACC_STATIC, "cast",
                    "(Ljava/lang/Object;)[Ljava/lang/Object;", null, null);
            methodVisitor.visitCode();
            Label end = new Label();
            methodVisitor.visitVarInsn(ALOAD, 0);
            methodVisitor.visitTypeInsn(INSTANCEOF, "[I");
            Label notint = new Label();
            methodVisitor.visitJumpInsn(IFEQ, notint);
            methodVisitor.visitVarInsn(ALOAD, 0);
            methodVisitor.visitTypeInsn(CHECKCAST, "[I");
            methodVisitor.visitJumpInsn(GOTO, end);
            methodVisitor.visitLabel(notint);
            methodVisitor.visitVarInsn(ALOAD, 0);
            methodVisitor.visitTypeInsn(INSTANCEOF, "[Ljava/lang/String;");
            Label notobj = new Label();
            methodVisitor.visitJumpInsn(IFEQ, notobj);
            methodVisitor.visitVarInsn(ALOAD, 0);
            methodVisitor.visitTypeInsn(CHECKCAST, "[Ljava/lang/String;");
            methodVisitor.visitJumpInsn(GOTO, end);
            methodVisitor.visitLabel(notobj);
            methodVisitor.visitMethodInsn(INVOKESTATIC, "test/se17/CastArray",
                    "newIllegalArgumentException", "()Ljava/lang/IllegalArgumentException;", false);
            methodVisitor.visitInsn(ATHROW);
            methodVisitor.visitLabel(end);
            methodVisitor.visitInsn(ARETURN);
            methodVisitor.visitMaxs(2, 2);
            methodVisitor.visitEnd();
        }
        {
            methodVisitor =
                    classWriter.visitMethod(ACC_PRIVATE | ACC_STATIC, "newIllegalArgumentException",
                            "()Ljava/lang/IllegalArgumentException;", null, null);
            methodVisitor.visitCode();
            Label label0 = new Label();
            methodVisitor.visitTypeInsn(NEW, "java/lang/IllegalArgumentException");
            methodVisitor.visitInsn(DUP);
            methodVisitor.visitMethodInsn(INVOKESPECIAL, "java/lang/IllegalArgumentException",
                    "<init>", "()V", false);
            methodVisitor.visitInsn(ARETURN);
            methodVisitor.visitMaxs(2, 0);
            methodVisitor.visitEnd();
        }
        classWriter.visitEnd();
        
        return classWriter.toByteArray();
    }
    
    public static void main(String[] args) throws IllegalAccessException, Exception {
        MethodHandles.lookup().defineClass(dump());
        test();
    }
    
    private static void test() {
        System.out.println(CastArray.cast(new String[0]));
    }
}

Note that CastArrayDump is compiled against a stub CastArray with a cast method that has the same signature.

When running this code I get the following exception:

Exception in thread "main" java.lang.VerifyError: (class: test/se17/CastArray, method: cast signature: (Ljava/lang/Object;)[Ljava/lang/Object;) Wrong return type in function
    at java.base/java.lang.ClassLoader.defineClass0(Native Method)
    at java.base/java.lang.System$2.defineClass(System.java:2307)
    at java.base/java.lang.invoke.MethodHandles$Lookup$ClassDefiner.defineClass(MethodHandles.java:2439)
    at java.base/java.lang.invoke.MethodHandles$Lookup$ClassDefiner.defineClass(MethodHandles.java:2416)
    at java.base/java.lang.invoke.MethodHandles$Lookup.defineClass(MethodHandles.java:1843)
    at test.se17/test.se17.CastArrayDump.main(CastArrayDump.java:69)

In the cast, the control flow merges with the end label.
On the stack we have either a String[] or a int[] - which should be merged into Object[].

An areturn instruction is type safe iff the enclosing method has a declared return type, ReturnType, that is a reference type, and one can validly pop a type matching ReturnType off the incoming operand stack.

Now, there should be a Object[] on the stack, because we just merged into that.
But for some reason, this is not the case, and we get "Wrong return type in function".

So, I must be missing something in the JVMS that forbids this.
Where in the JVMS does it state that this does not work?

(To be clear: I did never expect this to work in the first place)


Solution

  • Java 7’s JVM specification looked like:

    To merge two operand stacks, the number of values on each stack must be identical. The types of values on the stacks must also be identical, except that differently typed reference values may appear at corresponding places on the two stacks. In this case, the merged operand stack contains a reference to an instance of the first common superclass of the two types. Such a reference type always exists because the type Object is a superclass of all class and interface types. If the operand stacks cannot be merged, verification of the method fails.

    which is pretty straight-forward and does not contain any special rules for array types. Since this rule requires consulting the JLS anyway regarding the common superclass of the two reference types, we can do that for arrays too, to find the, e.g. Integer[] and Long[] have the common superclass Number[], but int[]’s direct superclass is Object and it’s assignable to Cloneable and Serializable but not Object[].

    Then, the specification was changed with Java 8 (Note that this part about arrays is the only change in the “Verification by Type Inference” section).

    Since then, it reads as you already cited:

    If corresponding values are both array reference types, then their dimensions are examined. If the array types have the same dimensions, then the merged value is a reference to an instance of an array type which is first common supertype of both array types.

    So far so as-before. It could stop here, instead of contradicting itself in the subsequent part:

    (If either or both of the array types has a primitive element type, then Object is used as the element type instead.)

    Keep in mind that there is no extra rule for the case that both types are the same. So if we take this insertion at face value, it even implies that merging int[] and int[] has to result in Object[] because “either or both of the array types has a primitive element type”.

    This is easy to disprove

    int[] array = Math.random() > 0.5? new int[5]: new int[10];
    System.out.println(array[3]);
    

    works with all Java versions and does not cause problems like “iaload used with the (merged) type Object[]”.

    I’m also skipping the part about the different dimensions and obvious examples.

    … Even int[] and String[] can be merged; the result is Object[], because Object is used instead of int when computing the first common supertype.

    This is a special situation. This time, it’s not a problematic wording or a too complex matter leading to misinterpretation, here, the author explicitly states that it is meant that way and the way it’s meant is just … as wrong as an information can be.

    It even violates common sense.

    You already wrote “I did never expect this to work in the first place” and how should it? What should the JVM do in the subsequent code if it really allowed an int[] to pass as Object[]?

    There is no part in the specification explaining how the JVM should deal with it and whatever way it should be treated, it would require drastic changes to the implementation.

    One option would be to produce errors when later code tries to access the int[] array as Object[]. Of course, it makes no sense to change the JVM to allow something, just to forbid the result at other places.

    The other option would be to implement an autoboxing feature at JVM level. Considering the amount of work for such a change, it would be a weird approach, to implement such a feature and only enable it when uncommon bytecode with an outdated class file version merges two differently typed arrays on two code paths.

    Likewise, it would be a strange approach to insert such a statement in the specification, just to add another rule somewhere else explaining why it doesn’t work. So I don’t think you’re missing something; I don’t expect to find such a statement anywhere. This obviously wrong statement shouldn’t exist in the first place.