Why is string a reference type, even though it's normally primitive data type such as int, float, or double.
Yikes, this answer got accepted and then I changed it. I should probably include the original answer at the bottom since that's what was accepted by the OP.
Update: Here's the thing. string
absolutely needs to behave like a reference type. The reasons for this have been touched on by all answers so far: the string
type does not have a constant size, it makes no sense to copy the entire contents of a string from one method to another, string[]
arrays would otherwise have to resize themelves -- just to name a few.
But you could still define string
as a struct
that internally points to a char[]
array or even a char*
pointer and an int
for its length, make it immutable, and voila!, you'd have a type that behaves like a reference type but is technically a value type.
This would seem quite silly, honestly. As Eric Lippert has pointed out in a few of the comments to other answers, defining a value type like this is basically the same as defining a reference type. In nearly every sense, it would be indistinguishable from a reference type defined the same way.
So the answer to the question "Why is string
a reference type?" is, basically: "To make it a value type would just be silly." But if that's the only reason, then really, the logical conclusion is that string
could actually have been defined as a struct
as described above and there would be no particularly good argument against that choice.
However, there are reasons that it's better to make string
a class
than a struct
that are more than purely intellectual. Here are a couple I was able to think of:
If string
were a value type, then every time you passed it to some method expecting an object
it would have to be boxed, which would create a new object
, which would bloat the heap and cause pointless GC pressure. Since strings are basically everywhere, having them cause boxing all the time would be a big problem.
Yes, string
could override Equals
regardless of whether it's a reference type or value type. But if it were a value type, then ReferenceEquals("a", "a")
would return false! This is because both arguments would get boxed, and boxed arguments never have equal references (as far as I know).
So, even though it's true that you could define a value type to act just like a reference type by having it consist of a single reference type field, it would still not be exactly the same. So I maintain this as the more complete reason why string
is a reference type: you could make it a value type, but this would only burden it with unnecessary weaknesses.
It's a reference type because only references to it are passed around.
If it were a value type then every time you passed a string from one method to another the entire string would be copied*.
Since it is a reference type, instead of string values like "Hello world!" being passed around -- "Hello world!" is 12 characters, by the way, which means it requires (at least) 24 bytes of storage -- only references to those strings are passed around. Passing around a reference is much cheaper than passing every single character in a string.
Also, it's really not a normal primitive data type. Who told you that?
*Actually, this isn't stricly true. If the string internally held a char[]
array, then as long as the array type is a reference type, the contents of the string would actually not be passed by value -- only the reference to the array would be. I still think this is basically right answer, though.