Search code examples
assemblyvirtual-machinebytecodecilmachine-code

Are Bytecode and Assembly Language the same thing?


The question might seem odd, but I am still trying to grasp the concepts of virtual machines. I have read several answers, but I still don't get if Java bytecode (and MSIL as well) is the same as assembly language. As far as I understand both bytecode and assembly gets compiled to machine code, so speaking in terms of abstraction they are at the same level, i.e. one step above machine code. So is bytecode just an assembly language, i.e. a human readable form of machine code. If yes, then why is assembly language still used? Why not programming in bytecode (which is portable across different machines) instead of assembly language (which is specific to a single machine architecture)? Thanks


Solution

  • No.

    Java bytecode is binary programming language, not in "human readable form", unless you consider bunch of number readable, or you use disassembler to reverse it into the bytecode text mnemonics, or eventually the Java source form itself.

    Assembly is usually text mnemonics of the actual instructions of the target machine, mapped 1:1 with each other, so one instruction in assembler source will translate directly into one machine code instruction (although some exceptions exists with some CPUs and assemblers, like for example many RISC assemblers will translate "load register with immediate value" into multiple instructions as needed - to load any immediate value, while the native machine code can load only particular bits, and you have to compose the whole value by several instructions).

    Java bytecode is quite high-level abstraction language compared to most of CPUs machine codes, having very tiny overlap of the instructions and memory model. The only similarity is, that bytecode is stored in binary form, just like machine code.


    edit:

    The JVM is interpreter in principle, ie. it translates the bytecode on the fly into machine code. That's the thing, which is done in other languages by compiler during compile time.

    The modern JVMs are not classic pure interpreters, but use "JIT" (Just In Time) compiler to compile small pieces of java bytecode into native machine code, just ahead of it's execution, using caches to avoid second compilation of already known .class files, and also using runtime tracking of performance data to better instruct JIT compiler, which bytecode should be optimized heavily (run often or inner loop), and which should be just compiled ASAP, without focus on performance.

    So with modern JVM it's hard to talk about interpreters, it's quite sophisticated and complex solution. C# goes quite often even one step further, delivering sometimes part of binaries pre-compiled into machine code for common platforms (having the bytecode form only as an fallback for uncommon platforms).

    None of this (not even similar) happens with machine code. It just executes on the CPU.