Search code examples
rustreferencesmart-pointers

What is the difference between how references and Box<T> are represented in memory?


I am trying to understand how references and Box<T> work. Let's consider a code example:

fn main() {
    let x = 5;
    let y = &x;

    assert_eq!(5, x);
    assert_eq!(5, *y);
}

In my imagination, Rust saves the value in memory as:

enter image description here

Consider this second code snippet with Box<T>:

fn main() {
    let x = 5;
    let y = Box::new(x);

    assert_eq!(5, x);
    assert_eq!(5, *y);
}

How is x going to be stored in Box? What does the memory look like?

The examples above are from Treating Smart Pointers Like Regular References with the Deref Trait. For the second example, the book explains as:

The only difference between Listing 15-7 and Listing 15-6 is that here we set y to be an instance of a box pointing to the value in x rather than a reference pointing to the value of x.

Does it mean that y in the box points directly to value 5?


Solution

  • Your diagram for the simple case is fine, although it may be unclear as you use 5 for both the value and the address. I've moved y in my diagram to prevent any confusion.

    What does memory look like for a Box<T>?

    The equivalent diagram for Box would look similar, but with the addition of the heap:

        Stack
    
         ADDR                    VALUE
        +------------------------------+
    x = |0x0001|                     5 |
    y = |0x0002|                0xFF01 |
        |0x0003|                       |
        |0x0004|                       |
        |0x0005|                       |
        +------------------------------+
    
        Heap
    
         ADDR                    VALUE
        +------------------------------+
        |0xFF01|                     5 |
        |0xFF02|                       |
        |0xFF03|                       |
        |0xFF04|                       |
        |0xFF05|                       |
        +------------------------------+
    

    (See the pedantic notes below about this diagram)

    Box has allocated enough space in the heap for us, here at address 0xFF01. The value is then moved from the stack onto the heap.

    Does it mean that y in the box points directly

    It does not. y holds the pointer to the data allocated by the Box. It must do this in order to be able to free the allocated memory when the Box goes out of scope.

    The point of the chapter you are reading is that Rust will transparently dereference the Box for you, so you don't usually need to concern yourself with this fact.

    See also:

    What's the difference in memory?

    This might bend your brain a little bit!

    Looking at the stack for both examples, there isn't really a difference between the two cases — both the reference and the Box are stored on the stack as a pointer. The only difference is in the code, where it knows to treat the value on the stack differently depending on if it's a reference or Box.

    In fact, this is true for everything in Rust! To the computer, it's all just bits, and the structure encoded in the program binary is the only thing that distinguishes one blob of bytes from another.

    Why is x still on the stack after being moved to the Box?

    Observant readers will note that I left the value 5 for x on the stack. There are two relevant reasons why:

    1. That's actually what happens in memory. Programs don't usually "reset" values they are done with as it would be unneeded overhead. Rust avoids problems by marking the variable as moved and disallowing access to the moved-from variable.

    2. In this case, i32 implements Copy, which means that it's OK to access the value after it's been moved. The compiler will actually allow us to continue accessing x. This wouldn't be true if x were a type that didn't implement Copy, such as a String or a Box.

    See also:

    Pedantic diagram notes

    • This diagram is not to scale. An i32 takes 4 bytes and a pointer / reference take a platform-dependent number of bytes, but it's simpler to assume everything is the same size.

    • The stack typically starts at a high address and grows downward, while the heap starts at a low address and grows upward.