We know that C# optimizes the concatenation of string literals. For those unaware, that's when C# internally turns this:
string myString = "one" + "two" + "three";
into this:
string myString = "onetwothree";
My question today is: what does the compiler do with interpolated strings? For example
string myFunc()
{
return "five six seven";
}
string mystring = "Onetwothree";
int myVal = 7;
string test = "eight" +
$"four {mystring} five {myVal} {myFunc()} seven" +
$"six {mystring}";
Edit: This question was prompted by the following underlying question:
I'm making a function that takes a POCO and outputs a new line in a CSV. How do I make the following code not slide off the edge of the screen without sacrificing performance?
public override void Append(MyCustomObject obj)
{
var line = obj.Process();
if (__file is null)
throw new FileNotFoundException("No file open for appending!");
string outstr = $@"""{obj.CogType}"",""{CognosObject.process_xpath(obj.SearchPath)}"",""{obj.DefaultName}"",""{CognosObject.process_pathurl(obj.PathUrl)}"",""{obj.Notes}""";
lock (__file)
__file.Write(Encoding.UTF8.GetBytes(outstr));
}
As far as I can tell, doing any sort of trick to it will increase runtime overhead.
C# does different things depending on what exactly you're interpolating. Let's look at a simplified example:
string mystring = "Onetwothree";
string test = "eight"+
$"four five {mystring} seven"+
$"six {mystring}";
(According to LINQPad) the compiler writes this as:
string mystring = "Onetwothree";
string test = string.Concat ("eightfour five ", mystring, " sevensix ", mystring);
In this case, the compiler recognized that all I was doing is variable string concatenation, so it combined the invariant parts and modified the rest to a call to String.Concat(string, string, string, string)
.
Ok, so how about the example from the question? Well the compiler does something a little different here. Let's look at the compiler-generated code (again, built with LINQPad):
string mystring = "Onetwothree";
int myVal = 7;
DefaultInterpolatedStringHandler defaultInterpolatedStringHandler = new DefaultInterpolatedStringHandler (18, 3);
defaultInterpolatedStringHandler.AppendLiteral ("four ");
defaultInterpolatedStringHandler.AppendFormatted (mystring);
defaultInterpolatedStringHandler.AppendLiteral (" five ");
defaultInterpolatedStringHandler.AppendFormatted (myVal);
defaultInterpolatedStringHandler.AppendLiteral (" ");
defaultInterpolatedStringHandler.AppendFormatted (<Main>g__myFunc|4_0 ());
defaultInterpolatedStringHandler.AppendLiteral (" seven");
string test = string.Concat ("eight", defaultInterpolatedStringHandler.ToStringAndClear (), "six ", mystring);
This time, it's using a DefaultInterpolatedStringHandler
. Essentally it autocombines the interpolated strings and uses this struct to do the heavy lifting, then it flushes it during a call to string.Concat().
Ok, so how about a slightly more complicated example?
string myFunc()
{
return "five six seven";
}
string mystring = "Onetwothree";
int myVal = 7;
string test = "eight" +
$"four {mystring} five {myVal}"+$"beep {myFunc()} {myVal} seven" +
$"six {mystring}";
The salient difference here is that I'm now connecting TWO "true" interpolated strings (as opposed to the string "six {mystring}"
which becomes string concatenation. On the backend?
string mystring = "Onetwothree";
int myVal = 7;
string[] obj = new string[5] { "eight", null, null, null, null };
DefaultInterpolatedStringHandler defaultInterpolatedStringHandler = new DefaultInterpolatedStringHandler (11, 2);
defaultInterpolatedStringHandler.AppendLiteral ("four ");
defaultInterpolatedStringHandler.AppendFormatted (mystring);
defaultInterpolatedStringHandler.AppendLiteral (" five ");
defaultInterpolatedStringHandler.AppendFormatted (myVal);
obj [1] = defaultInterpolatedStringHandler.ToStringAndClear ();
defaultInterpolatedStringHandler = new DefaultInterpolatedStringHandler (11, 2);
defaultInterpolatedStringHandler.AppendLiteral ("beep ");
defaultInterpolatedStringHandler.AppendFormatted (<Main>g__myFunc|4_0 ());
defaultInterpolatedStringHandler.AppendLiteral (" ");
defaultInterpolatedStringHandler.AppendFormatted (myVal);
defaultInterpolatedStringHandler.AppendLiteral ("seven");
obj [2] = defaultInterpolatedStringHandler.ToStringAndClear ();
obj [3] = "six ";
obj [4] = mystring;
string test = string.Concat (obj);
Well for one, there's no call for string.Concat(string, string, string, string, string, string)
so it's converted it to a call to string.Concat(string[]
instead. Makes sense.
But notice! It creates two DISH constructs. The compiler isn't even attempting to reduce them. On the x64 native code, the ret
is located at L0210, and if I manually combine the interpolated strings, it jumps up to L0186, a direct savings of ~138 bytes, though I would imagine the calls to ToStringAndClear
and DISH's ctor also involve a performance hit.
TLDR:
Simple interpolated strings become calls to string.Concat. More complex things don't necessarily operate in an optimal way, and although you should be aware of that, it's probably not something that matters performance-wise unless it's in your inner loop.