I have two text boxes, one for the input and another for the output. I need to filter only Hexadecimals characters from input and output it in uppercase. I have checked that using Regular Expressions (Regex
) is much faster than using loop.
My current code to uppercase first then filter the Hex digits as follow:
string strOut = Regex.Replace(inputTextBox.Text.ToUpper(), "[^0-9^A-F]", "");
outputTextBox.Text = strOut;
An alternatively:
string strOut = Regex.Replace(inputTextBox.Text, "[^0-9^A-F^a-f]", "");
outputTextBox.Text = strOut.ToUpper();
The input may contain up to 32k characters, therefore speed is important here. I have used TimeSpan
to measure but the results are not consistent.
My question is: which code has better speed performance and why?
This is definitely a case of premature optimization: 32K characters is not a big deal for finely tuned regex engines running on modern computers, so this optimization task is mostly theoretical.
Before discussing the performance, it's worth pointing out that the expressions are probably not doing what you want, because they allow ^
characters into the output. You need to use [^0-9A-F]
and [^0-9A-Fa-f]
instead.
The speed of the two regexes will be identical, because the number of characters in a character class hardly makes a difference. However, the second combination ToUpper
call will be called on a potentially shorter string, because all invalid characters will be removed. Therefore, the second option is potentially slightly faster.
However, if you must optimize this to the last CPU cycle, you can rewrite this without regular expressions to avoid a memory allocation in the ToUpper
: walk through the input string in a loop, and add all valid characters to StringBuilder
as you go. When you see a lowercase character, convert it to upper case.