I was learning Rust with a book and the following excerpt threw me a bit off:
Also note that &str
has the &
in front of it because you need a reference to use a str
. That's because of the reason we saw above: the stack needs to know the size, and a str
can be of any length. So we access it with a &
, a reference. The compiler knows the size of a reference's pointer, and it can then use the &
to find where the str
data is and read it. Also, because you use a &
to interact with a str
, you don't own it. But a String
is an "owned" type.
I understand that for variables of unknown size, you must place the data on the heap and then reference to it with a fixed-length pointer on the stack. My confusion lies with the statement that str
can be of any length.
Why can't a String
type also be of unknown length at times and require the whole reference to data on heap approach?
I understand that the book will probably dive deeper into the details later on, but I was wondering if someone could already provide some additional context for me, specifically regarding the question above? Any useful accompanying details regarding the &str
and String
types in Rust, that are good to know for a beginner to the language, are highly appreciated as well.
Like a slice [T]
, str
is a variably-sized type. (In fact, str
is essentially a [u8]
guaranteed to contain valid UTF-8.)
Variably-sized types are special. They do not implement the Sized
trait. A reference to a variably-sized type is "fat": it doesn't just hold the address of the referenced thing, but also its size.
str
therefore means "some area in memory which contains valid UTF-8 data". And &str
is "the address and size of such an area".
String
on the other hand is a struct with a fixed size. One of its members is a pointer to string data somewhere else (on the heap). Conceptually, a String
contains a &str
along with the unused capacity of the memory area. (In reality, a String
is a wrapper around a Vec<u8>
with UTF-8 guarantee, a Vec<u8>
conceptually contains a &[u8]
plus capacity but is really a raw pointer, size and capacity.)
The total memory required by a String
is therefore still variable, but the part that is the String
struct itself is known.
Why is it this way? Because the entire point of String
is to manage a memory region containing string data, and it can't do that if it is the memory region containing string data.
An aside:
I understand that for variables of unknown size, you must place the data on the heap
This is a misconception. The heap is the most obvious place to put variably-sized data, but
alloca
equivalent to allocate variably-sized data on the stack.