java bytecode java-bytecode-asm bytecode-manipulation

Determine where a catch block ends ASM

In ASM, I'm trying to determine the labels for a try-catch block.

Currently I have:

public void printTryCatchLabels(MethodNode method) {

    if (method.tryCatchBlocks != null) {
        for (int i = 0; i < method.tryCatchBlocks.size(); ++i) {

            Label start = method.tryCatchBlocks.get(i).start.getLabel();
            Label end = method.tryCatchBlocks.get(i).end.getLabel();
            Label catch_start = method.tryCatchBlocks.get(i).handler.getLabel();

            System.out.println("try{      " + start.toString());
            System.out.println("}         " + end.toString());
            System.out.println("catch {   " + catch_start.toString());
            System.out.println("}         "  /*where does the catch block end?*/);

        }
    }

}

I'm trying to determine where the label is for the end of the catch block but I don't know how. Why do I need it? Because I want to "remove" try-catch blocks from the byte-code.

For example, I am trying to change:

public void test() {
    try {
        System.out.println("1");
    } catch(Exception e) {
        //optionally rethrow e.
    }
    System.out.println("2");
}

to:

public void test() {
    System.out.println("1");
    System.out.println("2");
}

So to remove it, I thought that I could just get the labels and remove all instructions between the catch-start and the catch-end and then remove all the labels.

Any ideas?

Solution

I recommend reading the JVM Spec §3.12. Throwing and Handling Exceptions. It contains an example that is very simple but still exhibiting the problems with your idea:

Compilation of try-catch constructs is straightforward. For example:

void catchOne() {
    try {
        tryItOut();
    } catch (TestExc e) {
        handleExc(e);
    }
}

is compiled as:

Method void catchOne()
0   aload_0             // Beginning of try block
1   invokevirtual #6    // Method Example.tryItOut()V
4   return              // End of try block; normal return
5   astore_1            // Store thrown value in local var 1
6   aload_0             // Push this
7   aload_1             // Push thrown value
8   invokevirtual #5    // Invoke handler method: 
                        // Example.handleExc(LTestExc;)V
11  return              // Return after handling TestExc
Exception table:
From    To      Target      Type
0       4       5           Class TestExc

Here, the catch block ends with a return instruction, thus does not join with the original code flow. This, however, is not a required behavior. Instead, the compiled code could have a branch to the last return instruction in place of the 4 return instruction, i.e.

Method void catchOne()
0:  aload_0
1:  invokevirtual #6   // Method tryItOut:()V
4:  goto          13
7:  astore_1
8:  aload_0
9:  aload_1
10: invokevirtual #5   // Method handleExc:(LTestExc;)V
13: return
Exception table:
From    To      Target      Type
   0     4           7      Class TestExc

(e.g. at least one Eclipse version compiled the example exactly this way)

But it could also be vice versa, having a branch to instruction 4 in place of the last return instruction.

Method void catchOne()
0   aload_0
1   invokevirtual #6    // Method Example.tryItOut()V
4   return
5   astore_1
6   aload_0
7   aload_1
8   invokevirtual #5   // Method Example.handleExc(LTestExc;)V
11  goto 4
Exception table:
From    To      Target      Type
0       4       5           Class TestExc

So you already have three possibilities to compile this simple example which doesn’t contain any conditionals. The conditional branches associated with loops or if instructions do not necessarily point to the instruction right after a conditional block of code. If that block of code would be followed by another flow control instruction, the conditional branch (the same applies to switch targets) could short-circuit the branch.

So it’s very hard to determine which code belongs to a catch block. On the byte code level, it doesn’t even have to be a contiguous block but may be interleaved with other code.

And at this time we didn’t even speak about compiling finally and synchronized or the newer try(…) with resources statement. They all end up creating exception handlers that look like catch blocks on the byte code level.

Since branch instructions within the exception handler might target code outside of the handler when recovering from the exception, traversing the code graph of an exception handler doesn’t help here as processing the branch instruction correctly requires the very information about the branch target you actually want to gather.

So the only way to handle this task is to do the opposite. You have to traverse the code graph from the beginning of the method for the non-exceptionally execution and consider every encountered instruction as not belonging to an exception handler. For the simple task of stripping exception handlers this is already sufficient as you simply have to retain all encountered instructions and drop all others.