I am trying to build a scanner for AWK source code using (F)Lex analysis. I have been able to identify AWK keyworkds, comments, string literals, and digits however I am stuck on how to generate regular expressions for matching variable instance names since these are quite dynamic.
Could someone please help me develop a regular expression for matching AWK variables. http://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html provides definition for the AWK language.
Variables must start with a letter but can be alphanumerical without regard to case. The only special character that can be used is an underscore ("_"). I apologize I am not very experienced with REGEX let alone regular expressions for FLEX.
Thank you for your help.
[a-zA-Z_][a-zA-Z_0-9]*
Alphabetic or underscore to start, followed by zero or more alphanumerics or underscore.
Special cases will be fields, which are prefixed by $
:
$0
$1
and also
$NF
$i
You'll have to decide how you're going to deal with those.