Search code examples
javabytecodejava-bytecode-asmbytecode-manipulationjvm-bytecode

How to get references in ASM?


Summarization: Using ASM, Given a bytecode class, for each method instruction (MethodInsnNode) I need to get the references that are being used on it.

Considering the following class:

public void myMethod(){
String str1 = "str12";
String str2 = str1;
String str3 = "str3";
Boolean myBool = true;
Boolean myBool2 = true;
Cemo cemo = new Cemo();
assertTrue(cemo.isTrue());

assertTrue(cemo.isTrue());

}

Considering the following generated bytecode instructions:

Code:
   0: aload_0
   1: invokespecial #1                  // Method java/lang/Object."<init>":()V
   4: return



public void myMethod();
    Code:
       0: ldc           #2                  // String str12
       2: astore_1
       3: aload_1
       4: astore_2
       5: ldc           #3                  // String str3
       7: astore_3
       8: iconst_1
       9: invokestatic  #4                  // Method java/lang/Boolean.valueOf:(Z)Ljava/lang/Boolean;
      12: astore        4
      14: iconst_1
      15: invokestatic  #4                  // Method java/lang/Boolean.valueOf:(Z)Ljava/lang/Boolean;
      18: astore        5
      20: new           #5                  // class com/devfactory/utqg/analysis/InstrumentationClass$Cemo
      23: dup
      24: aconst_null
      25: invokespecial #6                  // Method com/devfactory/utqg/analysis/InstrumentationClass$Cemo."<init>":(Lcom/devfactory/utqg/analysis/InstrumentationClass$1;)V
      28: astore        6
      30: aload_0
      31: aload         6
      33: invokevirtual #7                  // Method com/d/utqg/analysis/InstrumentationClass$Cemo.isTrue:()Z
      36: invokestatic  #4                  // Method java/lang/Boolean.valueOf:(Z)Ljava/lang/Boolean;
      39: invokespecial #8                  // Method assertTrue:(Ljava/lang/Boolean;)V
      42: aload_0
      43: aload         6
      45: invokevirtual #7                  // Method com/d/utqg/analysis/InstrumentationClass$Cemo.isTrue:()Z
      48: invokestatic  #4                  // Method java/lang/Boolean.valueOf:(Z)Ljava/lang/Boolean;
      51: invokespecial #8                  // Method assertTrue:(Ljava/lang/Boolean;)V
      54: return
}

I'm trying to figure out a way of how to get object references that are being called using ASM. On the bytecode level, everytime that a INVOKESPECIAL instruction is called, it loads the values that will be used before. For example:

 31: aload         6     //Loading the value stored in 6 position
      33: invokevirtual #7                  // Method com/d/utqg/analysis/InstrumentationClass$Cemo.isTrue:()Z

So there's a reference to it there. But in ASM, there's no reference to this. The exactly stacktrace would be like this one, that's being composed by the actual instruction containing a "prev" attr that will be the method that have been called to load that variable:

Sample of inspected element

The problem is that we have the owner attribute, the name attribute, but I can't get the reference to that object. In the following case:

Boolean myBool2 = true;
Cemo cemo = new Cemo();
assertTrue(cemo.isTrue());

I need a reference to "cemo" object in ASM.

What I've tried so far: - Get the frame object, but it only contains the variable "slots", no references on it. - Analyze the MethodInsnNode previous instructions.

How should I accomplish this?


Solution

  • The JVM is a stack machine, i.e. a method is always invoked on the top values on the operand stack where the this reference is the first implicit argument of a non-static method. In order to do what you plan, you would need to keep track of all arguments on the operand stack at any time to then determine what value is currently filling in for this once you process a method call in the byte code.

    This means that you would need to process any instruction of a method and keep track of what object any register and stack slot currently refers to. In a limited manner, this allows you to track the instance onto which a method is called. Note however that Java (bytecode) program's can be very complex as they impose other limitations than the Java programming language and allow for arbitrary jumps in the code. Basically, in order to know what a method does at any point in time, you would need to emulate the method call for the general case so you are set up to something rather difficult.