Search code examples
javacompilationcompiler-theory

Compilation vs translation, "compiling" Java to bytecode?


My understanding is like this, definitions:

Translation - having code in some language, generating code in some other language.

Compilaton - translation to a machine code.

Machine code - direct instructions for CPU.

Now, from docs.oracle.com:

javac - Java programming language compiler

Compiler...? I think it is Java translator, because it is generating a code, that is not a machine code. Bytecode needs interpreter (JVM) to run, so it's definetely not a machine code.

From Wikipedia:

Java applications are typically compiled to bytecode

Similarly. According to definitions, I would say that Java is traslated to bytecode. There are many more examples on the Internet, I think there is confusion about that or I'm just missing something.

Could you please clarify this? What is the difference between translation and compilation?


Solution

  • It's all a matter of definitions, and there's no single accepted definition for what "compilation" means. In your eyes, compilation is transforming a source code in some language to native; so a transformation process which doesn't generate machine code shouldn't be called "compilation". In my eyes (and apparently, the javac documentation writers' eyes as well), it should.

    There are actually a lot of different terms: translation, compilation, decompilation, assembly, disassembly, and more.

    Personally, I'd say it makes sense to group all of these terms under "compilation", because all these processes have a lot in common:

    • They transform code in one formal language to code in another formal language.
    • They try to preserve the semantics of the input code as much as possible.
    • They all have a very similar design to each other, with a front-end, a back-end, and a possible optimizer in the middle (learn more about compiler structure here). I've seen the entrails of both javac and native compilers and they are relatively similar.

    In addition, your definition of "produces native code" is problematic:

    • What about compilers that can generate assembly but don't bother transforming that to machine code, leaving this to an external program (commonly called "assembler")? Would you deny them this definition of "compilers" because of that last, insignificant-in-comparison step?
    • How do you even classify "machine code"? What if tomorrow a processor which can run Java Bytecode natively is created?

    But these are just my opinions. I think that out there, the most accepted definitions are that:

    • Compilation is transforming code in a higher-level language to a lower-level one. Examples: Java to Java Bytecode, or C to x86 machine code.
    • Decompilation is transforming a code in a lower-level language to a higher-level one - in effect, the opposite of compilation. Examples: Java Bytecode to Java.
    • Translation or source-to-source compilation is transforming a code in some language to another language of comparable "level". Examples: ARM to x86, or C to Java. When the two languages are actually different versions of the same language (e.g. Javascript 6 to Javascript 5), the term transpiler is also used.
    • Assembly is transforming code in some assembly language to machine code.
    • Disassembly is either a synonym to decompilation or the opposite of assembly, depending on the context.

    Under these definitions, javac could definitely be considered as a compiler. But again, it's all in the definitions: from a technical standpoint, many of these actions have a lot in common.