Search code examples
javaperformancejvmrefactoringescape-analysis

Can JVM omit short-lived objects creation so I can do refactoring without hurting performance?


Sometimes during the course of an algorithm we need to compute or simply store several values that depend on each other or don't make any sense apart from one another.

Just as a (quite nonsensical, but what matters, simple) example lets find two distinct values in an int[] that are closest to the number 3:

int a = values[0];
int b = values[1];
for (int value : values) {
    int distance = Math.abs(value-3);
    if (value != b) {
        if (distance < Math.abs(a-3) ) {
            a = value;
        }
    }
    if (value != a) {
        if (distance < Math.abs(b-3) ) {
            b = value;
        }
    }        
}
takeSausageSliceBetween(a, b);

We can't just make methods computeA() and computeB() in the same class because computations for a and b are interdepentant. So computing a and b is just asking to be refactored into a separate class:

class DistinctNumbersNearestTo3 {
    final int a;
    final int b;
    DistinctNumbersNearestTo3(int[] values) {
        // same algorithm
    }
}

so the higher level code becomes nice and clear:

DistinctNumbersNearestTo3 nearest = new DistinctNumbersNearestTo3(values);
takeSausageSliceBetween(nearest.a, nearest.b);
// nearest never leaves the method it was declared in

However (please correct me if I'm wrong) it introduces, at least for some levels of optimisation, an instantiation and then garbage-collecting (just from eden, but anyway) of a new object just for the sake of being clear with your code. And also looking for int in stack turns into looking for an object in heap.

Question here is: Is JVM smart enough to eventually optimize the refactored code to operate just as the unrefactored code?

Another case where I'd like to bunch together several variables into a new object just for cleaner code is when I want to refactor a long method a() into several new methods by introducing a new class C which holds data specific to a()'s invocation and therefore contains those several new methods to operate on that data. I stumbled upon such case when implementing a class that represents a set of chains to which we can add new links (pairs of nodes) therefore extending an existing chain, combining two chains together into one, or creating a new chain of two nodes. Lets discuss that particular class just for some perspective:

public class SetOfChains {
    List<List<Node>> chains;
    public void addLink(Node a, Node b) {
        // Very long method that mutates field this.chains. Needs refactoring.
    }
}

To properly split addLink into several methods of SetOfChains, each of those new methods would have to have a lot of parameters, because there is data specific to addLink() invocation that is needed on every step of addLink(). Saving that data to dedicated fields of SetOfChains would as well be possible but smelly and inexpressive (because again it only makes sense during the method invocation), so obviously I create a new inner class to hold that data in its fields and do all the algorithm steps, mutating outer field chains as the unrefactored addLink() does:

class SetOfChains{
    List<List<Node>> chains;
    void addLink(Node a, Node b) {
      new LinkAddition(a,b).compute();
    }
    class LinkAddiditon { // Instances never leave addLink()'s scope
        private final Something bunch, of, common, fields;
        compute() {
            this.properly();
            this.isolated();
            this.steps();
            this.of();
            this.the();
            this.refactored();
            this.method();
        }
        // There be code for the properly isolated steps...
    }

But since fuctionally that is equivalent to having my old long unrefactored addLink(), can JVM make the resulting instructions to be as optimal as in the unrefactored case, as if there weren't any LinkAddition objects?

So, to draw the bottom line under that wall of text: If I strive for easy to understand code by extracting data and methods to new classes rather than methods, does it necessarily introduce a decrease in performance?

And do I understand correctly that the described cases are exactly what escape analysis is about?


Solution

  • It's been said that "premature optimization is the root of all evil".

    In general, it is better to make the code more understandable and only make it more complicated and harder to understand once a performance bottleneck is detected.

    Most Java code creates tons of short lived objects that become unreachable quickly. For that reason, I believe the HotSpot garbage collector is "generational" meaning it divides the heap into two separate areas: young generation and old generation.

    • The young generation - contains most of the newly created objects and are expected to be short lived. Garbage collecting this area is very efficient

    • The old generation - longer lived objects are eventually "promoted" from the young generation into the old generation. Garbage collecting this area is more expensive.

    So I don't think your fear of creating too many short lived objects is an issue since the garbage collector is designed specifically to support this use case.

    Also simple code is easier for the HotSpot to analyze and potentially inline/unroll/ optimize which also supports making many simple objects and methods.

    So to summarize, go ahead and make the easier to understand and maintain code that makes lots of short lived objects. You shouldn't see much of a performance decrease. Profile it to see the actual performance. If the performance is not to your liking, you can also refactor the bottlenecks to make it faster.