Search code examples
c#.netstringreferenceequals

Is it possible to create a string that's not reference-equal to any other string?


It seems like .NET goes out of its way to make strings that are equal by value equal by reference.

In LINQPad, I tried the following, hoping it'd bypass interning string constants:

var s1 = new string("".ToCharArray());
var s2 = new string("".ToCharArray());

object.ReferenceEquals(s1, s2).Dump();

but that returns true. However, I want to create a string that's reliably distinguishable from any other string object.

(The use case is creating a sentinel value to use for an optional parameter. I'm wrapping WebForms' Page.Validate(), and I want to choose the appropriate overload depending on whether the caller gave me the optional validation group argument. So I want to be able to detect whether the caller omitted that argument, or whether he passed a value that happens to be equal to my default value. Obviously there's other less arcane ways of approaching this specific use case, the aim of this question is more academical.),


Solution

  • It seems like .NET goes out of its way to make strings that are equal by value equal by reference.

    Actually, there are really only two special cases for strings that exhibit behavior like what you're describing here:

    1. String literals in your code are interned, so the same literal in two places will result in a reference to the same object.
    2. The empty string is a particularly weird case, where as far as I know literally every empty string in a .NET program is in fact the same object (i.e., "every empty string" constitutes a single string). This is the only case I know of in .NET where using the new keyword (on a class) may potentially not result in the allocation of a new object.

    From your question I get the impression you already knew about the first case. The second case is the one you've stumbled across. As others have pointed out, if you just go ahead and use a non-empty string, you'll find it's quite easy to create a string that isn't reference-equal to any other string in your program:

    public static string Sentinel = new string(new char[] { 'x' });
    

    As a little editorial aside, I actually wouldn't mind this so much (as long as it were documented); but it kind of irks me that the CLR folks (?) implemented this optimization without also going ahead and doing the same for arrays. That is, it seems to me they might as well have gone ahead and made every new T[0] refer to the same object too. Or, you know, not done that for strings either.