Search code examples
regexcompiler-constructionjflex

Regular expressions representing BNF


I am working on a lexer. I need to identify patterns in BNF and specify them using Regular expressions. I know there are TOKENS e.g. keywords, identifiers, operators etc. So far I have defined Regular expressions:

 digit=[0-9]
    integer={digit}+
    letter=[a-zA-Z]

But the given BNF rule for Identifier is:

   < id > ::= < letter >
                | "_"
                | < id > < digit >
                | < id > < letter >

Since the <id> is defined by Recursive pattern, how can I express this pattern using Regular expression. I tried this Regular expression: id={letter}+|"_"|{id}{digit}+ for the above BNF rule but it is giving me error that Regular expression contains cycle.


Solution

  • Looking at the BNF we can see that an <id> can begin with a letter or underscore. We can also see that an <id> can be followed by either a digit or letter and it is still a valid <id>. This is implies that an <id> begins with either a letter or underscore and can be followed by 0 or more digits or letters. This suggests the following regular expression:

    id = [a-zA-Z_][0-9a-zA-Z]*
    
    1. [a-zA-Z_] Matches a letter or '_'
    2. [0-9a-zA-Z]* Matches a digit or letter 0 or more times.

    But since you already have {digit} and {letter} already defined as individual character classes, this would be using JFlex (I am not that familiar with JFlex, so I may not have the JFlex syntax exactly right):

    id = ({letter}|_)({digit}|{letter})*
    

    This would be equivalent to the regex:

    ([a-zA-Z]|_)([0-9]|[a-zA-Z])*