Search code examples
regexantlrmatchingidentifierlexer

How to match identifiers NOT all consisted by numbers?


I have the following lexer: ID : [a-z][a-z0-9_]*;

It works well except matching identifiers like 1a or 222z222, but not all numbers like 1 or 999.

So, what should I do to solve the problem?


Solution

  • Your lexer is [a-z][a-z0-9_]*. So this will match which starts with lowercase letter, followed by zero or more lowercase letters or digits or underscore

    if you want the identifiers to start with either lowercase or digit but not all digits, then try

    [a-z][a-z0-9_]*;|[0-9]+[a-z_][a-z0-9_]*;  // Updated
    

    So it got two parts

    • [a-z][a-z0-9_]*; : matched which starts with lowercase
    • [0-9]+[a-z_][a-z0-9_]*; : if it starts with digits, the after one or more digits, it expects one letter or underscore, followed by zero or more letter, digit or underscore.

    You can write the same thing as ([a-z]|[0-9]+[a-z_])[a-z0-9_]*;.