Search code examples
javamemoryscopestacklocal-variables

How does Java handle memory with regards to homonymous local variables declared inside different not-nested code blocks inside a method?


I'm new to Java and programming in general. I'm currently studying how Java handles variables' memorization and scope. What I've understood is that:

  • local variables (i.e. variables declared inside methods) are memorized in the stack of the respective thread, inside the stack frame of the method where they are declared. Once the method ends its execution, the respective stack frame is removed from the stack and consequently the local variables cease to exists

  • the scope of the local variables goes from the point where they are declared to the end of the block in which they are contained, including any nested block. Therefore it is possible to declare two homonymous variables inside the same method if the scope of the two doesn't overlap. So it is legal to write something like this:

    public void myMethod(){
    
        String[] myArray = new String[2];
    
        for(int i=0; i< myArray.length; i++){
            String message = "hello";
            System.out.println(message);
    
        }
    
        if(true){
            int number =3;
            System.out.println(number);
        }
    
        String message = "Ciao";
        int number =-50;
    
        System.out.println(message);
        System.out.println(number);
    
    }
    

What I would like to understand is how Java handles memory with regards to homonymous variables existing inside the same method but without overlapping scopes. As I've said earlier, I've learnt that local variables are stored inside the method's stack frame until the method ends, but if it's really like so, how can two variables with the same name be stored inside the same stack frame? Do they really both live in the method's stack frame until the method ends or is it that once the scope of a local variable definitely ends, that variable is removed from the stack frame even though the method hasn't finished yet, therefore allowing the creation of another local variable with the same name?


Solution

  • Local variables don't exist. At all.

    java is a multi-step process. First, a compiler (javac.exe) turns .java files into .class files. Then a runtime (java.exe) runs the class file.

    During that first step (compilation), which is very formulaic (a spec spells out precisely how javac works. In contrast to C compilers which is allowed to, for example, do a deep analysis of your code, determine a loop has no side-effects whatsoever, and just completely eliminate it from the executable that gcc produces), local variables are lost.

    Specifically, at the class file format, the system uses something very different - the stack, and slots.

    Any method declares how many 'slots' it needs. The way java's specification works is: The spec decrees what should happen and which guarantees must be provided. It never spells out how things are done. But, sometimes explaining one particular 'how' is simpler than trying to delve into the spec. Know that this describes how most JVMs do it - but a JVM implementation doesn't have to:

    • Any method that declares, say, "I want 5 slots" results in, upon entry, the stack pointer being advanced by 5. The java class verifier already checks that any given method cannot possibly 'pop' more values off the stack than that it 'pushes'.
    • Methods always start with the parameters already pushed on the stack. It's up to the runtime how to rhyme this bullet point and the previous one - it could swap those with the 'slots', or store the 'slots' elsewhere, or store slots at the end, or check how much stack space it needs and put the 'slots' past that.
    • The method consists of bytecode and it is executed.

    Bytecode does not refer to 'local variables' because these do not exist in bytecode. Instead, bytecode can:

    • Refer to the stack, such as POP, which removes and discards the top of the stack, or FADD, which pops 2 values off of the stack, the JVM explodes if they aren't floats (so, we can assume they are), adds the two floats, and puts that back onto the stack. (NB: The verifier checks that the 'explodes' situation cannot occur. It sounds more dramatic than it is).
    • Run some bytecode that refers to a slot. For example, ALOAD_1 which is a simple bytecode instruction that fetches an object reference from 'slot #1' (that'd be the second slot - the first slot is slot #0), and pushes it onto the stack.

    Thus, this java code:

    int a = readInt();
    int b = readInt();
    println(a + b);
    

    might be compiled as:

    [START METHOD]
    [META: SLOT SIZE: 2]
    
    // int a = readInt(); - 'a' translates to slot 0.
    INVOKESTATIC com.foo.KeyboardInput readInt()I;
    ISTORE_0
    
    // int b = readInt(); - 'b' translates to slot 1.
    INVOKESTATIC com.foo.KeyboardInput readInt()I
    ISTORE_1
    
    ILOAD_0 // load int value in slot 0 and push onto stack.
    ILOAD_1 // load int value in slot 1 and push onto stack.
    IADD    // pop 2 int values, add them, push it back
    INVOKESTATIC com.foo.BasicOutput println(I)V
    

    Where println consumes 1 int off the stack.

    Let's now 'fancy up' our method and introduce another local var:

    int a = readInt();
    int b = readInt();
    int c = a + b;
    println(c);
    

    This code is easily identifiable as entirely identical in operation to the first. And if you compile this, it would in fact produce the exact same bytecode. No 'third slot' would appear. Because there's no need - the compiler doesn't translate a local variable 'one-to-one' to a slot. As long as you don't use c anywhere else, javac is perfectly capable of realizing that there is no need to have it exist as a slot.

    In fact, if you were to compile the above code, you'd end up with.. no slots whatsoever! After all, if that is the entire method, and a/b/c aren't used anywhere else, the compiler would produce:

    [START METHOD]
    [META: SLOT SIZE: 2]
    
    // readInt(); - just leave the read value on the stack.
    INVOKESTATIC com.foo.KeyboardInput readInt()I;
    
    // read a second value from the stack
    INVOKESTATIC com.foo.KeyboardInput readInt()I
    
    IADD    // pop 2 int values, add them, push it back
    INVOKESTATIC com.foo.BasicOutput println(I)V
    

    No need for even a single slot.

    Similarly then, imagine this method:

    void example() {
      int a = ....;
      int b = ....;
      // Tons and tons of math with a and b.
      println(a + b);
    
      // from here on out, a and b are never used again.
    
      int c = ....;
      int d = ....;
      // Tons and tons of math with c and d.
      // note that a and b are not used in this code
    }
    

    The compiler would only use at most 2 slots. Because 'c' and 'd' just g where 'a' and 'b' used to be. The compiler is perfectly capable of concluding that 'a' and 'b', as they aren't being used any more, can be 'overwritten'.

    Hence, there is simply no one-to-one mapping of local variables to slots:

    • A local variable may not result in a slot at all.
    • A single slot may serve as the storage place for multiple locals.
    • A slot may be declared even if no single local variable exists.
    • A single local variable may end up being used in different slot positions.
    • Slot 0 is always reserved for the this reference in instance methods.
    • Names of local variables aren't in class files. There is no need for the class file to know that the variables in our mathy parts are named 'a' and 'b'. This is freeing - name your variables whatever you want. Preferably, something that clearly indicates what they do. There is zero effect on either class file size or how fast it runs if you pick longer names. In contrast to e.g. javascript where it is theoretically possible you'd notice, and JS minimizers love peddling the idea that this doesn't just make things smaller, it also makes things faster. None of that applies to java. Java is always fast.

    With all that context, your question has become meaningless: Local variables don't exist in class files, therefore, 'how does the JVM deal with 2 locals with the same name' is trivial.