I have these 2 codes in Java and C++, which are supposed to do the same thing.
My intuition was that the size (and also content) of object code would be the same for R1
and R2
. It's the case for the C++ (difference of 4 bytes if compiled without -O1
). There is a bigger difference for the Java bytecode (R2
is longer), which is surprising for me.
Maybe I'm not looking at the right things and my question might not be relevant, but is it normal that the Java bytecode is so "close" from the source code, and does it mean that it's always more "efficient"/"optimized" to write everything in one-line instead of using local variables ?
C++
int A(int a) { return 0; }
int B(int b) { return 0; }
int C(int c) { return 0; }
int D(int d) { return 0; }
int R1() {
return A(B(C(3)+D(3)));
}
int R2() {
int d = D(3);
int c = C(3);
int b = B(c + d);
return A(b);
}
// Then R1() and R2() are called in the main()
Java
class MyClass {
static int A(int a) { return 0; }
static int B(int b) { return 0; }
static int C(int c) { return 0; }
static int D(int d) { return 0; }
static int R1() {
return A(B(C(3)+D(3)));
}
static int R2() {
int d = D(3);
int c = C(3);
int b = B(c + d);
return A(b);
}
// Then R1 and R2 are called in the Main()
}
When I compiled both of them (g++ -O1
version 9.4 and javac
version 11.0.17), and disassemble R1
and R2
, I get this:
C++ (g++ -O1 prog.cpp
)
R1:
1251: f3 0f 1e fa endbr64
1255: 53 push %rbx
1256: bf 03 00 00 00 mov $0x3,%edi
125b: e8 9d ff ff ff callq 11fd <_Z1Ci>
1260: 89 c3 mov %eax,%ebx
1262: bf 03 00 00 00 mov $0x3,%edi
1267: e8 bb ff ff ff callq 1227 <_Z1Di>
126c: 8d 3c 03 lea (%rbx,%rax,1),%edi
126f: e8 5f ff ff ff callq 11d3 <_Z1Bi>
1274: 89 c7 mov %eax,%edi
1276: e8 2e ff ff ff callq 11a9 <_Z1Ai>
127b: 5b pop %rbx
127c: c3 retq
R2:
<exact same as R1>
Java (javap -c MyClass
)
javap -c Appel
static int R1();
Code:
0: iconst_3
1: invokestatic #8 // Method C:(I)I
4: iconst_3
5: invokestatic #9 // Method D:(I)I
8: iadd
9: invokestatic #10 // Method B:(I)I
12: invokestatic #11 // Method A:(I)I
15: ireturn
static int R2();
Code:
0: iconst_3
1: invokestatic #9 // Method D:(I)I
4: istore_0
5: iconst_3
6: invokestatic #8 // Method C:(I)I
9: istore_1
10: iload_1
11: iload_0
12: iadd
13: invokestatic #10 // Method B:(I)I
16: istore_2
17: iload_2
18: invokestatic #11 // Method A:(I)I
21: ireturn
No. The detail you're missing is where the optimisation happens.
In C-land, the application that reads in your source code (gcc
for example) is the one that does most of the optimization (though more and more is done by the CPU itself, in its pipeline and microcode translation engines - not that there's a heck of a lot you can do to affect this as a programmer). Hence, it's that application (gcc
) that has an -o
(optimization level) option, and that is the application that potentially is going to churn through a ton of CPU cycles analysing your code to death.
In java-land, that is not how it works. javac
is on rails: The spec decrees exactly what it should generate, pretty much down to the byte - in contrast to C-land where the spec is stacked to the gills with 'mays' and 'coulds' - compilers have a ton of leeway, notably including the bit width of your basic 'word', whereas java locks all of that down in spec, regardless of the bit-width of the CPU architecture you end up running on.
The optimization is done by java.exe
- the runtime. And its approach is, as a rule, more efficient than C can ever be, because unlike C, the runtime gets the benefit of being able to check 'live behaviour', not something a C compiler can do (that's why C compilers tend to have a lot of 'hinting' systems, where you can inform the compiler about what you suspect the runtime behaviour is likely to be).
All modern JVMs work by running code inefficiently (that seemingly inefficient code that javac produced, which you notices with javap -v
), and in fact run it even more inefficiently than that, as the JVM will add a bunch of hooks to add bookkeeping. For example, the JVM will track how often a method is invoked, how long it takes to run, and for example, for each if
, counts of how often each branch is taken. All making it run even slower.
The JVM does this, because for 99% (literally) of all programs out there, about 99% (again, literally, not an exaggeration) of CPU/memory is 'spent' on less than 1% of the code, but the trick is, trying to predict which 1% that is. With all that bookkeeping, java will know, and will then analyse the bytecode of that 1%, and gets to use the runtime behaviour observed so far due to all that bookkeeping, to come up with fine-tuned machine code. This means java gets to write code that branch predicts (ensures the machine code does not have to jump around for most often taken branch path), except it's not a prediction: java.exe
knows which one is the 'most often taken path'. Vs gcc
which has to guess, optionally assisted by the programmer with branch hints in the source file.
That's just one of thousands of places where java.exe
can apply machine code optimization.
That still means ~99% of the code runs very inefficiently. But, given that this takes less than 1% of CPU/memory, it just doesn't matter.
Java is slower than C due to various factors, but 'optimizing instructions' is not one of them:
Naturally then, java is neither popular nor a good idea for simple command line one-off tools like ls
or /bin/true
. Java is fantastic at performance when writing, say, a web request responder. Those run for a long time and that hotspot process really helps there.