Search code examples
c#stringprimitive-typesreference-type

Why is string a reference type?


Why is string a reference type, even though it's normally primitive data type such as int, float, or double.


Solution

  • Yikes, this answer got accepted and then I changed it. I should probably include the original answer at the bottom since that's what was accepted by the OP.

    New Answer

    Update: Here's the thing. string absolutely needs to behave like a reference type. The reasons for this have been touched on by all answers so far: the string type does not have a constant size, it makes no sense to copy the entire contents of a string from one method to another, string[] arrays would otherwise have to resize themelves -- just to name a few.

    But you could still define string as a struct that internally points to a char[] array or even a char* pointer and an int for its length, make it immutable, and voila!, you'd have a type that behaves like a reference type but is technically a value type.

    This would seem quite silly, honestly. As Eric Lippert has pointed out in a few of the comments to other answers, defining a value type like this is basically the same as defining a reference type. In nearly every sense, it would be indistinguishable from a reference type defined the same way.

    So the answer to the question "Why is string a reference type?" is, basically: "To make it a value type would just be silly." But if that's the only reason, then really, the logical conclusion is that string could actually have been defined as a struct as described above and there would be no particularly good argument against that choice.

    However, there are reasons that it's better to make string a class than a struct that are more than purely intellectual. Here are a couple I was able to think of:

    To prevent boxing

    If string were a value type, then every time you passed it to some method expecting an object it would have to be boxed, which would create a new object, which would bloat the heap and cause pointless GC pressure. Since strings are basically everywhere, having them cause boxing all the time would be a big problem.

    For intuitive equality comparison

    Yes, string could override Equals regardless of whether it's a reference type or value type. But if it were a value type, then ReferenceEquals("a", "a") would return false! This is because both arguments would get boxed, and boxed arguments never have equal references (as far as I know).

    So, even though it's true that you could define a value type to act just like a reference type by having it consist of a single reference type field, it would still not be exactly the same. So I maintain this as the more complete reason why string is a reference type: you could make it a value type, but this would only burden it with unnecessary weaknesses.


    Original Answer

    It's a reference type because only references to it are passed around.

    If it were a value type then every time you passed a string from one method to another the entire string would be copied*.

    Since it is a reference type, instead of string values like "Hello world!" being passed around -- "Hello world!" is 12 characters, by the way, which means it requires (at least) 24 bytes of storage -- only references to those strings are passed around. Passing around a reference is much cheaper than passing every single character in a string.

    Also, it's really not a normal primitive data type. Who told you that?

    *Actually, this isn't stricly true. If the string internally held a char[] array, then as long as the array type is a reference type, the contents of the string would actually not be passed by value -- only the reference to the array would be. I still think this is basically right answer, though.