I am looking for solutions to recognize right receiver of a method invocation when analysis Java Bytecodes. That is, to identify whether the receiver is from which class field members or arguments.
Take below bytecode for example, there are two field members: _caller1
and _caller2
public Class MyClass{
test.code.jit.asm.classInline.CI_Caller1 _caller1;
flags:
test.code.jit.asm.classInline.CI_Caller1 _caller2;
flags:
public int test(java.lang.String, java.lang.String, test.code.jit.asm.classInline.CI_Caller1);
flags: ACC_PUBLIC
Code:
stack=4, locals=5, args_size=3
0: aload_0
1: getfield #14 // Field _caller1:Ltest/code/jit/asm/classInline/CI_Caller1;
4: invokevirtual #26 // Method test/code/jit/asm/classInline/CI_Caller1.test_two_fields_callee:()I
7: istore_3
8: aload_0
9: getfield #16 // Field _caller2:Ltest/code/jit/asm/classInline/CI_Caller1;
12: invokevirtual #26 // Method test/code/jit/asm/classInline/CI_Caller1.test_two_fields_callee:()I
15: istore 4
17: getstatic #32 // Field java/lang/System.out:Ljava/io/PrintStream;
20: new #38 // class java/lang/StringBuilder
23: dup
.....
72: ireturn
What I want to know is that how I can recognize the right receivers of method invocations at #4, #12. Are receivers class field members (which one) or method arguments? It is relative easy for human-eye reading but how I implement it by Java code(Better if there have already existing tool).
Currently I am using Java ASM framework to parse class bytecode sequences. It would be appreciate if some ideas can be provided (It seems I have to build Bytecode AST here), or some Java util/related links are also helpful.
When an invokevirtual
instruction is executed, all arguments are popped off the stack, followed by popping the receiver object. So your example is the most trivial one: the method has no arguments to pop, so the instruction right before it supplies the receiver, but even for a no-arg method, it’s the most trivial case, as in theory, there could be a stack neutral instruction sequence between the instruction providing the receiver and the invocation. Also, the preceding field read is the most trivial case as it is luckily preceded by the aload_0
instruction which provides the instance, whose field is being read. And as long as there is no preceding write to variable 0
, it will still contain the this
instance, if we are looking at a non-static
method…
After naming all the lucky coincidences, it should be mentioned, that for ordinary Java code and the mainstream compilers, most of these prerequisites will hold, so if you can live with covering, say 99% of all code, the biggest obstacles are the arguments on top of the stack, which might get produced by arbitrary expressions, including conditionals, so the code between the provider of the receiver instance and the actual invocation can be quite long.
The only way to track back to the instruction which pushed the method receiver, is to scan the code forwardly and model the operand stack as a stack of objects storing their source instruction and interpret all instruction’s effect on that operand stack. Note that the groundwork for such an interpreter already exists.