I'm trying to start with tokenize, since I want to write my own scripting-language based on C#.
So for now, I'm just playing a bit round and I'm learning regex a bit deeper. So, I'm pretty new to regex.
For example, I want to match
foreach(str x:test.GetItems())
and get the groupvalues str
, x
, test.GetItems()
my regex is:
foreach\s*\((\s*([A-Za-z0-9]+)\s+([A-Za-z0-9]+))\s*\:\s*(.+)\)
and this works so far.
so my questions are:
foreach
in my regex? If not, what would you prefer me to do?:
in my syntax, I can write in regex :
or also \:
. RegExr.com allows both and matches both (but displays them in other colors. nevertheless, it writes Matches a ":" character (char code 58).
) Should I escape the character, or shouldn't I?Is it good practice to have something like hard-coded character-sequences [...] in my regex?
You need to match a literal string, (foreach
, using
or even potato
) then write it as is. There is absolutely no reason why you would escape/split/anythingelse with it.
for the
:
in my syntax, I can write in regex:
or also\:
. [...] Should I escape the character, or shouldn't I?
Since :
doesn't have any special meaning, you don't have to escape it. Furthermore, you shouldn't escape it, because some regex engine might raise a syntax error.
The color mismatch may be due to improper parsing like this screenshot suggest:
c
should be purple or all other should be black (except \s
)