Search code examples
pointersrustreferenceownershipborrowing

Does a function parameter that accepts a string reference point directly to the string variable or the data on the heap in Rust


I've taken this picture and code from The Rust Book.

Why does s point to s1 rather than just the data on the heap itself?

If so this is how it works? How does the s point to s1. Is it allocated memory with a ptr field that contains the memory address of s1. Then, does s1, in turn point to the data.

In s1, I appear to be looking at a variable with a pointer, length, and capacity. Is only the ptr field the actual pointer here?

This is my first systems level language, so I don't think comparisons to C/C++ will help me grok this. I think part of the problem is that I don't quite understand what exactly pointers are and how the OS allocates/deallocates memory.

Pointer

fn main() {
    let s1 = String::from("hello");

    let len = calculate_length(&s1);

    println!("The length of '{}' is {}.", s1, len);
}

fn calculate_length(s: &String) -> usize {
    s.len()
}

Solution

    • The memory is just a huge array, which can be indexed by any offset (e.g. u64).
    • This offset is called address,
    • and a variable that stores an address called a pointer.
    • However, usually only some small part of memory is allocated, so not every address is meaningful (or valid).
    • Allocation is a request to make a (sequential) range of addresses meaningful to the program (so it can access/modify).
    • Every object (and by object I mean any type) is located in allocated memory (because non-allocated memory is meaningless to the program).
    • Reference is actually a pointer that is guaranteed (by a compiler) to be valid (i.e. derived from address of some object known to a compiler). Take a look at std doc also.

    Here an example of these concepts (playground):

    // This is, in real program, implicitly defined,
    // but for the sake of example made explicit.
    // If you want to play around with the example,
    // don't forget to replace `usize::max_value()`
    // with a smaller value.
    let memory = [uninitialized::<u8>(); usize::max_value()];
    
    // Every value of `usize` type is valid address.
    const SOME_ADDR: usize = 1234usize;
    
    // Any address can be safely binded to a pointer,
    // which *may* point to both valid and invalid memory.
    let ptr: *const u8 = transmute(SOME_ADDR);
    
    // You find an offset in our memory knowing an address
    let other_ptr: *const u8 = memory.as_ptr().add(SOME_ADDR);
    
    // Oversimplified allocation, in real-life OS gives a block of memory.
    unsafe { *other_ptr = 15; }
    
    // Now it's *meaningful* (i.e. there's no undefined behavior) to make a reference.
    let refr: &u8 = unsafe { &*other_ptr };
    

    I hope that clarify most things out, but let's cover the questions explicitly though.

    Why does s point to s1 rather than just the data on the heap itself?

    s is a reference (i.e. valid pointer), so it points to the address of s1. It might (and probably would) be optimized by a compiler for being the same piece of memory as s1, logically it still remains a different object that points to s1.

    How does the s point to s1. Is it allocated memory with a ptr field that contains the memory address of s1.

    The chain of "pointing" still persists, so calling s.len() internally converted to s.deref().len, and accessing some byte of the string array converted to s.deref().ptr.add(index).deref().

    There are 3 blocks of memory that are displayed on the picture: &s, &s1, s1.ptr are different (unless optimized) memory addresses. And all of them are stored in the allocated memory. The first two are actually stored at pre-allocated (i.e. before calling main function) memory called stack and usually it is not called an allocated memory (the practice I ignored in this answer though). The s1.ptr pointer, in contrast, points to the memory that was allocated explicitly by a user program (i.e. after entering main).

    In s1, I appear to be looking at a variable with a pointer, length, and capacity. Is only the ptr field the actual pointer here?

    Yes, exactly. Length and capacity are just common unsigned integers.