Search code examples
regexflex-lexer

How to write a regular expression that allows non-repeating underscores


I'm working on a homework assignment to use Flex to create a lexer. The last requirement I have to meet is:

The definition for the identifiers should be modified so that underscores can be included, however, consecutive underscores, leading and trailing underscores should not be permitted.

The given regex is [A-Za-z][A-Za-z0-9]*. Getting it to recognize underscores was easy, I just added it to the second grouping like [A-Za-z][A-Za-z0-9_]*. As is, the regex does not match any strings with leading underscores.

While doing my due diligence to make sure I wasn't posting something I didn't need to, I created this regex that seems to work [A-Za-z]([A-Za-z0-9][_]?[A-Za-z0-9])*. This looks for a letter at the start, then a repeating pattern or alphanumeric character, possible underscore, and alphanumeric character. While this works, I don't think it is what is expected and was hoping to get some advice on any better ways

I've been testing using the following strings (provided by the instructor):

name_1
name__2
_name3
name4_

Solution

  • In your [A-Za-z]([A-Za-z0-9][_]?[A-Za-z0-9])* the first [A-Za-z0-9] can/must be omitted (consider e.g. single letter identifiers), leading to [A-Za-z]([_]?[A-Za-z0-9])*. That seems to be exactly what was asked, and seems a good exercise to study the effects of combining optional elements under repetition in the regular expression.