I'm wondering if someone can explain to me what exactly the compiler might be doing for me to observe such extreme differences in performance for a simple method.
public static uint CalculateCheckSum(string str) {
char[] charArray = str.ToCharArray();
uint checkSum = 0;
foreach (char c in charArray) {
checkSum += c;
}
return checkSum % 256;
}
I'm working with a colleague doing some benchmarking/optimizations for a message processing application. Doing 10 million iterations of this function using the same input string took about 25 seconds in Visual Studio 2012, however when the project was built using the "Optimize Code" option turned on the same code executed in 7 seconds for the same 10 million iterations.
I'm very interested to understand what the compiler is doing behind the scenes for us to be able to see a greater than 3x performance increase for a seemingly innocent block of code such as this.
As requested, here is a complete Console application that demonstrates what I am seeing.
class Program
{
public static uint CalculateCheckSum(string str)
{
char[] charArray = str.ToCharArray();
uint checkSum = 0;
foreach (char c in charArray)
{
checkSum += c;
}
return checkSum % 256;
}
static void Main(string[] args)
{
string stringToCount = "8=FIX.4.29=15135=D49=SFS56=TOMW34=11752=20101201-03:03:03.2321=DEMO=DG00121=155=IBM54=138=10040=160=20101201-03:03:03.23244=10.059=0100=ARCA10=246";
Stopwatch stopwatch = Stopwatch.StartNew();
for (int i = 0; i < 10000000; i++)
{
CalculateCheckSum(stringToCount);
}
stopwatch.Stop();
Console.WriteLine(stopwatch.Elapsed);
}
}
Running in debug with Optimization off I see 13 seconds, on I get 2 seconds.
Running in Release with Optimization off 3.1 seconds and on 2.3 seconds.
To look at what the C# compiler does for you, you need to look at the IL. If you want to see how that affects the JITted code, you'll need to look at the native code as described by Scott Chamberlain. Be aware that the JITted code will vary based on processor architecture, CLR version, how the process was launched, and possibly other things.
I would usually start with the IL, and then potentially look at the JITted code.
Comparing the IL using ildasm
can be slightly tricky, as it includes a label for each instruction. Here are two versions of your method compiled with and without optimization (using the C# 5 compiler), with extraneous labels (and nop
instructions) removed to make them as easy to compare as possible:
Optimized
.method public hidebysig static uint32
CalculateCheckSum(string str) cil managed
{
// Code size 46 (0x2e)
.maxstack 2
.locals init (char[] V_0,
uint32 V_1,
char V_2,
char[] V_3,
int32 V_4)
ldarg.0
callvirt instance char[] [mscorlib]System.String::ToCharArray()
stloc.0
ldc.i4.0
stloc.1
ldloc.0
stloc.3
ldc.i4.0
stloc.s V_4
br.s loopcheck
loopstart:
ldloc.3
ldloc.s V_4
ldelem.u2
stloc.2
ldloc.1
ldloc.2
add
stloc.1
ldloc.s V_4
ldc.i4.1
add
stloc.s V_4
loopcheck:
ldloc.s V_4
ldloc.3
ldlen
conv.i4
blt.s loopstart
ldloc.1
ldc.i4 0x100
rem.un
ret
} // end of method Program::CalculateCheckSum
Unoptimized
.method public hidebysig static uint32
CalculateCheckSum(string str) cil managed
{
// Code size 63 (0x3f)
.maxstack 2
.locals init (char[] V_0,
uint32 V_1,
char V_2,
uint32 V_3,
char[] V_4,
int32 V_5,
bool V_6)
ldarg.0
callvirt instance char[] [mscorlib]System.String::ToCharArray()
stloc.0
ldc.i4.0
stloc.1
ldloc.0
stloc.s V_4
ldc.i4.0
stloc.s V_5
br.s loopcheck
loopstart:
ldloc.s V_4
ldloc.s V_5
ldelem.u2
stloc.2
ldloc.1
ldloc.2
add
stloc.1
ldloc.s V_5
ldc.i4.1
add
stloc.s V_5
loopcheck:
ldloc.s V_5
ldloc.s V_4
ldlen
conv.i4
clt
stloc.s V_6
ldloc.s V_6
brtrue.s loopstart
ldloc.1
ldc.i4 0x100
rem.un
stloc.3
br.s methodend
methodend:
ldloc.3
ret
}
Points to note:
blt.s
rather than clt
followed by brtrue.s
when checking whether or not to go round the loop again (this is the reason for one of the extra locals).