As we all know, strings in .NET are immutable. (Well, not 100% totally immutable, but immutable by design and used as such by any reasonable person, anyway.)
This makes it basically OK that, for example, the following code just stores a reference to the same string in two variables:
string x = "shark";
string y = x.Substring(0);
// Proof:
fixed (char* c = y)
{
c[4] = 'p';
}
Console.WriteLine(x);
Console.WriteLine(y);
The above outputs:
sharp
sharp
Clearly x
and y
refer to the same string
object. So here's my question: why wouldn't Substring
always share state with the source string? A string is essentially a char*
pointer with a length, right? So it seems to me the following should at least in theory be allowed to allocate a single block of memory to hold 5 characters, with two variables simply pointing to different locations within that (immutable) block:
string x = "shark";
string y = x.Substring(1);
// Does c[0] point to the same location as x[1]?
fixed (char* c = y)
{
c[0] = 'p';
}
// Apparently not...
Console.WriteLine(x);
Console.WriteLine(y);
The above outputs:
shark
park
For two reasons:
The string meta data (e.g. length) is stored in the same memory block as the characters, to allow one string to use part of the character data of another string would mean that you would have to allocate two memory blocks for most strings instead of one. As most strings are not substrings of other strings, that extra memory allocation would be more memory consuming than what you could gain by reusing part of strings.
There is an extra NUL character stored after the last character of the string, to make the string also usable by system functions that expect a null terminated string. You can't put that extra NUL character after a substring that is part of another string.