Search code examples
parsingantlrantlrworks

ANTLR: Parsing 2-digit numbers when other numeric literals are also possible


I'm writing a grammar for a moderately sized language, and I'm trying to implement time literals of the form hh:mm:ss.

However, whenever I try to parse, for example, 12:34:56 as a timeLiteral, I get mismatched token exceptions on the digits. Does anyone know what I might be doing wrong?

Here are the relevant rules as currently defined:

timeLiteral
    :   timePair COLON timePair COLON timePair -> ^(TIMELIT timePair*)
    ;

timePair
    :   DecimalDigit DecimalDigit
    ;

NumericLiteral
    : DecimalLiteral
    ;

fragment DecimalLiteral
    : DecimalDigit+ ('.' DecimalDigit+)?
    ;

fragment DecimalDigit
    : ('0'..'9')
    ;

Solution

  • The problem is that the lexer is gobbling the DecimalDigit and returning a NumericLiteral.

    The parser will never see DecimalDigits because it is a fragment rule.

    I would recommend moving timeLiteral into the lexer (capitalize its name). So you'd have something like

    timeLiteral
        :   TimeLiteral -> ^(TIMELIT TimeLiteral*)
        ;
    
    number
        :   DecimalLiteral
        ;
    
    TimeLiteral
        :   DecimalDigit DecimalDigit COLON 
            DecimalDigit DecimalDigit COLON
            DecimalDigit DecimalDigit
        ;
    
    DecimalLiteral
        :   DecimalDigit+ ('.' DecimalDigit+)?
        ;
    
    fragment DecimalDigit
        :   ('0'..'9')
        ;
    

    Keep in mind that the lexer and parser are completely independent. The lexer determines which tokens will be passed to the parser, then the parser gets to group them.