Search code examples
cregexawkflex-lexer

FLEX for generating AWK scanner - Identify Variables


I am trying to build a scanner for AWK source code using (F)Lex analysis. I have been able to identify AWK keyworkds, comments, string literals, and digits however I am stuck on how to generate regular expressions for matching variable instance names since these are quite dynamic.

Could someone please help me develop a regular expression for matching AWK variables. http://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html provides definition for the AWK language.

Variables must start with a letter but can be alphanumerical without regard to case. The only special character that can be used is an underscore ("_"). I apologize I am not very experienced with REGEX let alone regular expressions for FLEX.

Thank you for your help.


Solution

  • [a-zA-Z_][a-zA-Z_0-9]*
    

    Alphabetic or underscore to start, followed by zero or more alphanumerics or underscore.

    Special cases will be fields, which are prefixed by $:

    $0
    $1
    

    and also

    $NF
    $i
    

    You'll have to decide how you're going to deal with those.