Search code examples
regextokenize

regex styleguide with plaintext


I'm trying to start with tokenize, since I want to write my own scripting-language based on C#.

So for now, I'm just playing a bit round and I'm learning regex a bit deeper. So, I'm pretty new to regex.

For example, I want to match

foreach(str x:test.GetItems())

and get the groupvalues str, x, test.GetItems()

my regex is:

foreach\s*\((\s*([A-Za-z0-9]+)\s+([A-Za-z0-9]+))\s*\:\s*(.+)\)

and this works so far.

so my questions are:

  • is it good practice to have something like hardcoded character-sequences, e.g. foreach in my regex? If not, what would you prefer me to do?
  • for the : in my syntax, I can write in regex : or also \:. RegExr.com allows both and matches both (but displays them in other colors. nevertheless, it writes Matches a ":" character (char code 58).) Should I escape the character, or shouldn't I?

Solution

  • Is it good practice to have something like hard-coded character-sequences [...] in my regex?

    You need to match a literal string, (foreach, using or even potato) then write it as is. There is absolutely no reason why you would escape/split/anythingelse with it.

    for the : in my syntax, I can write in regex : or also \:. [...] Should I escape the character, or shouldn't I?

    Since : doesn't have any special meaning, you don't have to escape it. Furthermore, you shouldn't escape it, because some regex engine might raise a syntax error.

    The color mismatch may be due to improper parsing like this screenshot suggest: enter image description here

    c should be purple or all other should be black (except \s)