I am trying to remove a method to optimize my code. It seems I didn't gain any performance, moreover, the "optimized" code is slower! Is it faster to call a method than to create a variable in the loop? Why?
Why the following code is faster (1.3-1.5 seconds)
public void getPureText(string notClearedText)
{
string peeledText = "";
foreach (var symbol in notClearedText)
{
if(isCyrillic(symbol))
peeledText += Char.ToLower(symbol);
else
peeledText += " ";
}
}
private bool isCyrillic(int letterCode)
{
switch (letterCode)
{
case 1028: // Є
case 1108: // є
case 1030: // І
case 1110: // і
case 1031: // Ї
case 1111: // ї
case 1168: // Ґ
case 1169: // ґ
case 32: // " "
case 39: // '
//case 45: // -
return true;
default:
return
1040 <= letterCode && letterCode <= 1103 && // Cyrillic
letterCode != 1066 && // Ъ
letterCode != 1067 && // Ы
letterCode != 1098 // ъ
||
65 <= letterCode && letterCode <= 90
||
97 <= letterCode && letterCode <= 122
;
}
}
than the "optimized" version (1.5-1.8 seconds)? What am I missing?
public void getPureText(string notClearedText)
{
string peeledText = "";
foreach (var symbol in notClearedText)
{
int letterCode = symbol;
switch (letterCode)
{
case 1028: // Є
case 1108: // є
case 1030: // І
case 1110: // і
case 1031: // Ї
case 1111: // ї
case 1168: // Ґ
case 1169: // ґ
case 32: // " "
case 39: // ' //case 45: // -
peeledText += Char.ToLower(symbol);
break;
default:
if (
1040 <= letterCode && letterCode <= 1103 && // Cyrillic
letterCode != 1066 && // Ъ
letterCode != 1067 && // Ы
letterCode != 1098 // ъ
||
65 <= letterCode && letterCode <= 90
||
97 <= letterCode && letterCode <= 122
)
peeledText += Char.ToLower(symbol);
else
peeledText += " ";
break;
}
}
}
I have run dozens of tests using
void TestPerformance()
{
Stopwatch sw = new Stopwatch();
sw.Start();
_textRepository.getPureText(RawTextExamples.veryLongText);
sw.Stop();
unitTestFormGuess.show(sw.Elapsed.ToString());
}
P.S. As you see I removed some code from getPureText(), made it return void, then measured time again: the same result. Something wrong is there...
P.P.S. Configuration: Debug.
EDIT
For peeledText
replaced type string
to StringBuilder
.
Configuration: release.
Size of the string is the same: 150 KB.
3 series of the testing for 500 iterations each.
With method isCyrillic
code: 6.63-6.70 milliseconds
Inlined: 6.80-6.90 milliseconds (still slower o_0)
Inlined but using RegEx: 6.62-6.70 milliseconds
With method isCyrillic
but using HashSet
instead of the switch
: 7.89-8.32 milliseconds.
If the code is taking 1.6 seconds to run processing one single input string, then its a rather large one. I'd stop using string concatenation (+=
) and start using a System.Text.StringBuilder
, its probably faster and much more memory efficient:
public void getPureText(string notClearedText)
{
var peeledText = new StringBuilder(notClearedText.Length);
foreach (var symbol in notClearedText)
{
if(isCyrillic(symbol))
peeledText.Append(Char.ToLower(symbol));
else
peeledText.Append(' ');
}
}
If your are calling getPureText
in a somewhat tight loop you might consider reusing buffer
simply clearing it and avoiding the newing cost on each call.
Benchmark it again in release mode, discarding the warm up run and without debugger attached. If that still doesn't meet your performance goals then start micro optimizing inlining calls, etc. The jitter is pretty smart optimizing code so its probably not going to gain you much.
long Benchmark(string veryLongText, int repetitions)
{
getPureText(veryLongText); //warmup
var watch = Stopwatch.StartNew();
for (var i = 0; i < repetitions; i++)
getPureText(veryLongText);
watch.Stop();
return watch.ElapsedMilliseconds/repetitions;
}