I'm reading all the lines from a TextBox
and I am trying to remove all the whitespace that will be in the list.
I need to be able to tokenize the following expression:
if(x==0)
{
cout<<x;
}
into
if
(
x
==
0
)
{
cout
<<
x
;
}
My code:
public static string[] Tokenize(string sourceCode)
{
Regex RE = new Regex(@"([\s+\+\-\*\%\,\;\&\|\<\>\=\!\{\}])");
string[] x = RE.Split(sourceCode);
var list = new List<string>(x);
list.Remove(" ");
for (int m = 0; m < list.Count(); m++)
{
Console.WriteLine(list[m]);
}
return (RE.Split(sourceCode));
}
My output:
if(x
=
=
0)
{
cout
<
<
x
;
}
How can I split with symbols like ==
<<
&&
and how to remove spaces from the list?
Is there a better way of achieving what I want?
I agree to @juharr's comment.
But if you really want to use regex, it would be better to use the Match
method instead of Split
because it allows you to specify the tokens you are looking for instead of the token boundaries:
Regex RE = new Regex(@"\w+|\(|\)|\++|-+|\*|%|,|;|&+|\|+|<+|>+|=+|!|\{|\}");
foreach (Match m in RE.Matches(sourceCode))
{
Console.WriteLine(m.Value);
}
Result:
if
(
x
==
0
)
{
cout
<<
x
;
}