Search code examples
flex-lexerlex

Extending a regex definition without repeating the definition


I was wondering if there is some way to extend a regex definition without repeating the symbols inside.

For example, the following definitions are all contained in each other. Is there any notation to extend lettersAndNumbers from letters, lettersAndNumbersAndUnderscore from lettersAndNumbers and so on?

%{
}%
letters                          [A-Za-z]
lettersAndNumbers                [A-Za-z0-9]  /* extension of letters */
lettersAndNumbersAndUnderscore   [A-Za-z0-9_] /* extension of lettersAndNumbers */
%%

I have some definitions with more complicated symbols and definitions and I would like to remove all these duplicities.


Solution

  • Sure. Just use the | operator:

    letters             [a-zA-Z]
    digits              [0-9]
    lettersAndDigits    {letters}|{digits}
    wordCharacters      {lettersAndDigits}|_
    

    Flex provides the {+} operator, which computes the union of two character classes. It also provides the often more useful {-} operator, which computes set difference. Both are described in the documentation for Flex patterns, which is certainly worth reading if you are using Flex.

    Unfortunately, those operators cannot be used with macros, because the expansions of Flex macros are automatically surrounded with parentheses (which is why the macros above work in Flex). For Flex, a parenthesised character class is a subexpression, not a character class, so it's not allowed as an operand to the set operators. But even if you could do that, it wouldn't provide any real advantage. The compiled regular expressions are essentially the same; a union of character classes is no more efficient than a union of patterns.

    For these particular cases, though, there's no need to use macros. Just use the built-in named character classes. Instead of {letters}, you can use [[:alpha:]]; {lettersAndDigits} is [[:alnum:]] and {wordCharacters} is [[:alnum:]_]. Using standard Posix classes frees anyone reading your code from having to figure out what your idiosyncratic macros expand to.