Search code examples
regexsyntaxsyntax-errorflex-lexerlex

Lex: Breaking up long regular expressions over multiple lines


What is the correct syntax for breaking long lex regular expressions over multiple lines in a .l file.

For example, say I have a regular expression like:

word1|word2|word3|word4  ECHO;

When I attempt to do this:

word1|word2|
word3|word4  ECHO;

I get an error. What is the correct way for breaking up regex over multiple lines in lex?


Solution

  • With flex (as an extension to standard lex syntax), you can use the (?x:…) syntax, similar to PCRE/Perl extended syntax. Note that unlike PCRE, the text to which the x flag applies is surrounded by parentheses. [Note 1].

    Within the parentheses, comments and whitespace are ignored unless they are escaped or quoted. So you can write:

    (?x:
       word1 |
       word2 |
       word3 |
       word4 )    ECHO;
    

    Note: This syntax cannot be used in the definitions section, only in the rules section. I don't know if that is by design or whether some future enhancement might lift the restriction.

    See the flex manual for a few more details. (It's in the section which starts ‘(?r-s:pattern)’)


    Notes

    1. In PCRE (that is, python), you would write (?x) --- extended regex, and the extension continues until the end of the regex, unless you turn it off. I won't even try to explain the rules Perl uses to detect the end of an eXtended regex.