compiler-construction bytecode abstract-syntax-tree java-bytecode-asm

Parse receiver of Java method invocation on the Bytecode level

I am looking for solutions to recognize right receiver of a method invocation when analysis Java Bytecodes. That is, to identify whether the receiver is from which class field members or arguments.

Take below bytecode for example, there are two field members: _caller1 and _caller2

public Class MyClass{
  test.code.jit.asm.classInline.CI_Caller1 _caller1;
    flags: 

  test.code.jit.asm.classInline.CI_Caller1 _caller2;
    flags: 

  public int test(java.lang.String, java.lang.String, test.code.jit.asm.classInline.CI_Caller1);
    flags: ACC_PUBLIC
    Code:
      stack=4, locals=5, args_size=3
         0: aload_0       
         1: getfield      #14                 // Field _caller1:Ltest/code/jit/asm/classInline/CI_Caller1;
         4: invokevirtual #26                 // Method test/code/jit/asm/classInline/CI_Caller1.test_two_fields_callee:()I
         7: istore_3      
         8: aload_0       
         9: getfield      #16                 // Field _caller2:Ltest/code/jit/asm/classInline/CI_Caller1;
        12: invokevirtual #26                 // Method test/code/jit/asm/classInline/CI_Caller1.test_two_fields_callee:()I
        15: istore        4
        17: getstatic     #32                 // Field java/lang/System.out:Ljava/io/PrintStream;
        20: new           #38                 // class java/lang/StringBuilder
        23: dup           
         .....
        72: ireturn

What I want to know is that how I can recognize the right receivers of method invocations at #4, #12. Are receivers class field members (which one) or method arguments? It is relative easy for human-eye reading but how I implement it by Java code(Better if there have already existing tool).

Currently I am using Java ASM framework to parse class bytecode sequences. It would be appreciate if some ideas can be provided (It seems I have to build Bytecode AST here), or some Java util/related links are also helpful.

Solution

When an invokevirtual instruction is executed, all arguments are popped off the stack, followed by popping the receiver object. So your example is the most trivial one: the method has no arguments to pop, so the instruction right before it supplies the receiver, but even for a no-arg method, it’s the most trivial case, as in theory, there could be a stack neutral instruction sequence between the instruction providing the receiver and the invocation. Also, the preceding field read is the most trivial case as it is luckily preceded by the aload_0 instruction which provides the instance, whose field is being read. And as long as there is no preceding write to variable 0, it will still contain the this instance, if we are looking at a non-static method…

After naming all the lucky coincidences, it should be mentioned, that for ordinary Java code and the mainstream compilers, most of these prerequisites will hold, so if you can live with covering, say 99% of all code, the biggest obstacles are the arguments on top of the stack, which might get produced by arbitrary expressions, including conditionals, so the code between the provider of the receiver instance and the actual invocation can be quite long.

The only way to track back to the instruction which pushed the method receiver, is to scan the code forwardly and model the operand stack as a stack of objects storing their source instruction and interpret all instruction’s effect on that operand stack. Note that the groundwork for such an interpreter already exists.