I'm working on a homework assignment to use Flex to create a lexer. The last requirement I have to meet is:
The definition for the identifiers should be modified so that underscores can be included, however, consecutive underscores, leading and trailing underscores should not be permitted.
The given regex is [A-Za-z][A-Za-z0-9]*
. Getting it to recognize underscores was easy, I just added it to the second grouping like [A-Za-z][A-Za-z0-9_]*
. As is, the regex does not match any strings with leading underscores.
While doing my due diligence to make sure I wasn't posting something I didn't need to, I created this regex that seems to work [A-Za-z]([A-Za-z0-9][_]?[A-Za-z0-9])*
. This looks for a letter at the start, then a repeating pattern or alphanumeric character, possible underscore, and alphanumeric character. While this works, I don't think it is what is expected and was hoping to get some advice on any better ways
I've been testing using the following strings (provided by the instructor):
name_1
name__2
_name3
name4_
In your [A-Za-z]([A-Za-z0-9][_]?[A-Za-z0-9])*
the first [A-Za-z0-9]
can/must be omitted (consider e.g. single letter identifiers), leading to [A-Za-z]([_]?[A-Za-z0-9])*
. That seems to be exactly what was asked, and seems a good exercise to study the effects of combining optional elements under repetition in the regular expression.