Search code examples
javajvmbytecode

If boolean is represented as int in the JVM, how does it correspond with Java being strongly-typed?


I read that JVM represents boolean as 4-byte (int). My question is - Java is strongly typed - and conversion to int is not allowed. As I understand, JVM is used to run code accordingly to Java's specs (?), so if a 3 + true written in C++ is compiled to byte-code, it is legal.

What am I missing?


Solution

  • Your understanding “JVM is used to run code accordingly to Java's specs” is wrong.

    There are two distinct specifications

    1. The Java® Language Specification which describes the semantics and behavior of the Java programming language. It’s typically used to create software that runs in a JVM, but this is no strict requirement.
    2. The Java® Virtual Machine Specification describes the JavaVirtual Machine, a certain execution environment, which has been designed to be convenient as a target platform for software written in the Java programming language, but is not restricted to this use.

    This is clarified right in the JVM Spec §1.2:

    The Java Virtual Machine knows nothing of the Java programming language, only of a particular binary format, the class file format. A class file contains Java Virtual Machine instructions (or bytecodes) and a symbol table, as well as other ancillary information.

    So there can be arbitrary differences that have to be accommodated by Java compilers when compiling source code of the Java programming language targeting the Java Virtual Machine as execution environment.


    That said, it is wrong to say that “JVM represents boolean as 4-byte (int)”. You have been misled by the fact, that at certain places, items of distinct types are handled by the same instructions. Especially that for local variables, boolean and int values are handled by the same instructions, however, the same applies to byte, short, and char, all five types are handled equally using the same byte code instructions.

    Actually, the fact that long and double types are handled by other instructions, is a historical compromise to simplify the implementations at the time when the first JVM was designed. The type of each variable and stack entry can be inferred at every place, so it would also work with a generic instruction set without any encoded type information.

    While these instructions dealing with local variables and the operand stack make no distinction between boolean, byte, short, char, and int, the JVM makes a distinction between all these types when it comes to method and field signatures. There, boolean is a dedicated type. In contrast, when it comes to arrays, boolean arrays and byte arrays are handled using the same instructions, which are different to the instructions dealing with int arrays. Still, the boolean[] and byte[] arrays itself have a distinct type.

    Whether the actual storage of values of these types differs, is entirely up to the particular JVM implementation.