Search code examples
javabytecodejava-bytecode-asmbyte-buddybytecode-manipulation

Can I insert instructions in constructors before calling this() / super() and before initialising any final fields?


Preface

I have been experimenting with ByteBuddy and ASM, but I am still a beginner in ASM and between beginner and advanced in ByteBuddy. This question is about ByteBuddy and about JVM bytecode limitations in general.

Situation

I had the idea of creating global mocks for testing by instrumenting constructors in such a way that instructions like these are inserted at the beginning of each constructor:

if (GlobalMockRegistry.isMock(getClass()))
  return;

FYI, the GlobalMockRegistry basically wraps a Set<Class<?>> and if that set contains a certain class, then isMock(Class<?>> clazz) would return true. The advantage of that concept is that I can (de)activate global mocking for each class during runtime because if multiple tests run in the same JVM process, one test might need a certain global mock, the next one might not.

What the if(...) return; instructions above want to achieve is that if mocking is active, the constructor should not do anything:

  • no this() or super() calls,update: impossible
  • no field initialisations, → update: possible
  • no other side effects. → update: might be possible, see my update below

The result would be an object with uninitialised fields that did not create any (possibly expensive) side effects such as resource allocation (database connection, file creation, you name it). Why would I want that? Could I not just create an instance with Objenesis and be happy? Not if I want a global mock, i.e. mock objects I cannot inject because they are created somewhere inside methods or field initialisers I do not have control over. Please do not worry about what method calls on such an object would do if its instance fields are not properly initialised. Just assume I have instrumented the methods to return stub results, too. I know how to do that already, the problem are only constructors in the context of this question.

Questions / problems

Now if I try to simulate the desired result in Java source code, I meet the following limitations:

  • I cannot insert any code before this() or super(). I could mitigate that by also instrumenting the super class hierarchy with the same if(...) return;, but would like to know if I could in theory use ASM to insert my code before this() or super() using a method visitor. Or would the byte code of the instrumented class somehow be verified during loading or retransformation and then rejected because the byte code is "illegal"? I would like to know before I start learning ASM because I want to avoid wasting time for an idea which is not feasible.

  • If the class contains final instance fields, I also cannot enter a return before all of those fields have been initialised in the constructor. That might happen at the very end of a complex constructor which performs lots of side effects before actually initialising the last field. So the question is similar to the previous one: Can I use ASM to insert my if(...) return; before any fields (including final ones) are initialised and produce a valid class which I could not produce using javac and will not be rejected when loaded or retransformed?

BTW, if it is relevant, we are talking about Java 8+, i.e. at the time of writing this that would be Java versions 8 to 14.

If anything about this question is unclear, please do not hesitate to ask follow-up questions, so I can improve it.


Update after discussing Antimony's answer

I think this approach could work and avoid side effects, calling the constructor chain but avoiding any side effects and resulting in a newly initialised instance with all fields empty (null, 0, false):

  1. In order to avoid calling this.getClass(), I need to hard-code the mock target's class name directly into all constructors up the parent chain. I.e. if two "global mock" target classes have the same parent class(es), multiple of the following if blocks would be woven into each corresponding parent class, one for each hard-coded child class name.

  2. In order to avoid any side effects from objects being created or methods being called, I need to call a super constructor myself, using null/zero/false values for each argument. That would not matter because the next parent class up the chain would have a similar code block so that the arguments given do not matter anyway.

// Avoid accessing 'this.getClass()'
if (GlobalMockRegistry.isMock(Sub.class)) {
  // Identify and call any parent class constructor, ideally a default constructor.
  // If none exists, call another one using default values like null, 0, false.
  // In the class derived from Object, just call 'Object.<init>'.
  super(null, 0, false);
  return;
}

// Here follows the original byte code, i.e. the normal super/this call and
// everything else the original constructor does.

Note to myself: Antimony's answer explains "uninitialised this" very nicely. Another related answer can be found here.


Next update after evaluating my new idea

I managed to validate my new idea with a proof of concept. As my JVM byte code knowledge is too limited and I am not used to the way of thinking it requires (stack frames, local variable tables, "reverse" logic of first pushing/popping variables, then applying an operation on them, not being able to easily debug), I just implemented it in Javassist instead of ASM, which in comparison was a breeze after failing miserably with ASM after hours of trial & error.

I can take it from here and I want to thank user Antimony for his very instructive answer + comments. I do know that theoretically the same solution could be implemented using ASM, but it would be exceedingly difficult in comparison because its API is too low level for the task at hand. ByteBuddy's API is too high level, Javassist was just right for me in order to get quick results (and easily maintainable Java code) in this case.


Solution

  • Yes and no. Java bytecode is much less restrictive than Java (source) in this regard. You can put any bytecode you want before the constructor call, as long as you don't actually access the uninitialized object. (The only operations allowed on an uninitialized this value are calling a constructor, setting private fields declared in the same class, and comparing it against null).

    Bytecode is also more flexible in where and how you make the constructor call. For example, you can call one of two different constructors in an if statement, or you can wrap the super constructor call in a "try block", both things that are impossible at the Java language level.

    Apart from not accessing the uninitialized this value, the only restriction* is that the object has to be definitely initialized along any path that returns from the constructor call. This means the only way to avoid initializing the object is to throw an exception. While being much laxer than Java itself, the rules for Java bytecode were still very deliberately constructed so it is impossible to observe uninitialized objects. In general, Java bytecode is still required to be memory safe and type safe, just with a much looser type system than Java itself. Historically, Java applets were designed to run untrusted code in the JVM, so any method of bypassing these restrictions was a security vulnerability.

    * The above is talking about traditional bytecode verification, as that is what I am most familiar with. I believe stackmap verification behaves similarly though, barring implementation bugs in some versions of Java.

    P.S. Technically, Java can have code execute before the constructor call. If you pass arguments to the constructor, those expressions are evaluated first, and hence the ability to place bytecode before the constructor call is required in order to compile Java code. Likewise, the ability to set private fields declared in the same class is used to set synthetic variables that arise from the compilation of nested classes.

    If the class contains final instance fields, I also cannot enter a return before all of those fields have been initialised in the constructor.

    This, however, is eminently possible. The only restriction is that you call some constructor or superconstructor on the uninitialized this value. (Since all constructors recursively have this restriction, this will ultimately result in java.lang.Object's constructor being called). However, the JVM doesn't care what happens after that. In particular, it only cares that the fields have some well typed value, even if it is the default value (null for objects, 0 for ints, etc.) So there is no need to execute the field initializers to give them a meaningful value.

    Is there any other way to get the type to be instantiated other than this.getClass() from a super class constructor?

    Not as far as I am aware. There's no special opcode for magically getting the Class associated with a given value. Foo.class is just syntactic sugar which is handled by the Java compiler.