Search code examples
javaoptimizationcompilationbytecode

Should I stop using local variables in Java?


I have these 2 codes in Java and C++, which are supposed to do the same thing.

My intuition was that the size (and also content) of object code would be the same for R1 and R2. It's the case for the C++ (difference of 4 bytes if compiled without -O1). There is a bigger difference for the Java bytecode (R2 is longer), which is surprising for me.

Maybe I'm not looking at the right things and my question might not be relevant, but is it normal that the Java bytecode is so "close" from the source code, and does it mean that it's always more "efficient"/"optimized" to write everything in one-line instead of using local variables ?

C++

int A(int a) { return 0; }
int B(int b) { return 0; }
int C(int c) { return 0; }
int D(int d) { return 0; }

int R1() {
  return  A(B(C(3)+D(3)));
}

int R2() {
  int d = D(3);
  int c = C(3);
  int b = B(c + d);
  return A(b);
}

// Then R1() and R2() are called in the main()

Java

class MyClass {
  static int A(int a) { return 0; }
  static int B(int b) { return 0; }
  static int C(int c) { return 0; }
  static int D(int d) { return 0; }
  
  static int R1() {
    return  A(B(C(3)+D(3)));
  }
  
  static int R2() {
    int d = D(3);
    int c = C(3);
    int b = B(c + d);
    
    return A(b);
  }

  // Then R1 and R2 are called in the Main()
}

When I compiled both of them (g++ -O1 version 9.4 and javac version 11.0.17), and disassemble R1 and R2, I get this:

C++ (g++ -O1 prog.cpp)

R1:
  1251: f3 0f 1e fa             endbr64 
  1255: 53                      push   %rbx
  1256: bf 03 00 00 00          mov    $0x3,%edi
  125b: e8 9d ff ff ff          callq  11fd <_Z1Ci>
  1260: 89 c3                   mov    %eax,%ebx
  1262: bf 03 00 00 00          mov    $0x3,%edi
  1267: e8 bb ff ff ff          callq  1227 <_Z1Di>
  126c: 8d 3c 03                lea    (%rbx,%rax,1),%edi
  126f: e8 5f ff ff ff          callq  11d3 <_Z1Bi>
  1274: 89 c7                   mov    %eax,%edi
  1276: e8 2e ff ff ff          callq  11a9 <_Z1Ai>
  127b: 5b                      pop    %rbx
  127c: c3                      retq   

R2:
  <exact same as R1>

Java (javap -c MyClass)

javap -c Appel 
static int R1();
  Code:
     0: iconst_3
     1: invokestatic  #8                  // Method C:(I)I
     4: iconst_3
     5: invokestatic  #9                  // Method D:(I)I
     8: iadd
     9: invokestatic  #10                 // Method B:(I)I
    12: invokestatic  #11                 // Method A:(I)I
    15: ireturn

static int R2();
  Code:
     0: iconst_3
     1: invokestatic  #9                  // Method D:(I)I
     4: istore_0
     5: iconst_3
     6: invokestatic  #8                  // Method C:(I)I
     9: istore_1
    10: iload_1
    11: iload_0
    12: iadd
    13: invokestatic  #10                 // Method B:(I)I
    16: istore_2
    17: iload_2
    18: invokestatic  #11                 // Method A:(I)I
    21: ireturn

Solution

  • No. The detail you're missing is where the optimisation happens.

    In C-land, the application that reads in your source code (gcc for example) is the one that does most of the optimization (though more and more is done by the CPU itself, in its pipeline and microcode translation engines - not that there's a heck of a lot you can do to affect this as a programmer). Hence, it's that application (gcc) that has an -o (optimization level) option, and that is the application that potentially is going to churn through a ton of CPU cycles analysing your code to death.

    In java-land, that is not how it works. javac is on rails: The spec decrees exactly what it should generate, pretty much down to the byte - in contrast to C-land where the spec is stacked to the gills with 'mays' and 'coulds' - compilers have a ton of leeway, notably including the bit width of your basic 'word', whereas java locks all of that down in spec, regardless of the bit-width of the CPU architecture you end up running on.

    The optimization is done by java.exe - the runtime. And its approach is, as a rule, more efficient than C can ever be, because unlike C, the runtime gets the benefit of being able to check 'live behaviour', not something a C compiler can do (that's why C compilers tend to have a lot of 'hinting' systems, where you can inform the compiler about what you suspect the runtime behaviour is likely to be).

    All modern JVMs work by running code inefficiently (that seemingly inefficient code that javac produced, which you notices with javap -v), and in fact run it even more inefficiently than that, as the JVM will add a bunch of hooks to add bookkeeping. For example, the JVM will track how often a method is invoked, how long it takes to run, and for example, for each if, counts of how often each branch is taken. All making it run even slower.

    The JVM does this, because for 99% (literally) of all programs out there, about 99% (again, literally, not an exaggeration) of CPU/memory is 'spent' on less than 1% of the code, but the trick is, trying to predict which 1% that is. With all that bookkeeping, java will know, and will then analyse the bytecode of that 1%, and gets to use the runtime behaviour observed so far due to all that bookkeeping, to come up with fine-tuned machine code. This means java gets to write code that branch predicts (ensures the machine code does not have to jump around for most often taken branch path), except it's not a prediction: java.exe knows which one is the 'most often taken path'. Vs gcc which has to guess, optionally assisted by the programmer with branch hints in the source file.

    That's just one of thousands of places where java.exe can apply machine code optimization.

    That still means ~99% of the code runs very inefficiently. But, given that this takes less than 1% of CPU/memory, it just doesn't matter.

    Java is slower than C due to various factors, but 'optimizing instructions' is not one of them:

    • Java cannot use architecture-specific features that affect the underlying language model, such as 80-bit width variables in what used to be the coprocessor. Project Valhalla is trying to fix that.
    • More generally java has a hard time interfacing directly with arch/OS-local low-level API.
    • That bookkeeping, and the garbage collector, tend to have 'ramp up time'. They start off slow and become faster over time. Vs. C code which pretty much springs into existence running as fast as it ever will.

    Naturally then, java is neither popular nor a good idea for simple command line one-off tools like ls or /bin/true. Java is fantastic at performance when writing, say, a web request responder. Those run for a long time and that hotspot process really helps there.