Search code examples
c#.netstringmemorycompiler-construction

How do strings look from the compiler's point of view?


In C, the compiler has a pointer to the start of the string and has an end-symbol ('\0'). If a user wants to calculate the length of the string, the compiler has to count elements of the string array until it finds '\0'.

In UCSD-strings, the compiler has the length of the string in the first symbols.

And what does the compiler think about C#-strings? Yes, from the user's point of view String is an object that has a field Length, I'm not talking about high-level stuff. I want to know deep algorithms; e.g., how does the compiler calculate the length of the string?


Solution

  • Let's execute the following code:

    string s = "123";
    string s2 = "234";
    string s3 = s + s2;
    string s4 = s2 + s3;
    Console.WriteLine(s + s2);
    

    Now let's put a breakpoint at the last line and open the memory window:

    Strings

    Writing s3 in the memory window we can see the 2 (s3 and s4) strings allocated one after the other with 4 bytes of size at the beginning.

    Also you can see that other memory is allocated such as the strings class type token and other string class data.

    The string class itself contains a member private int m_stringLength; which contains the length of the string, this also makes string.Concat() execute super fast (by allocating the whole length at the beginning):

    int totalLength = str0.Length + str1.Length + str2.Length;
    
    String result = FastAllocateString(totalLength);
    FillStringChecked(result, 0, str0);
    FillStringChecked(result, str0.Length, str1);
    FillStringChecked(result, str0.Length + str1.Length, str2);
    

    What I find a little strange is that the implementation of IEnumerable<char>.Count() for string is done using the default implementation which means iterating items one by one unlike ICollection<T>s like List<T> where the IEnumerable<char>.Count() is implemented by taking its ICollection<T>.Count property.