antlr4

Why is my antlr4 grammar not parsing this text correctly


I have the following grammar in antlr4 that I am using to parse betting lines.

grammar betting;

/*
 * Parser Rules
 */

market
    :   zero
    |   array
    |   grid
    ;

zero
    :   (outcomePrice NEWLINE)+ outcomePrice
    ;
    
array
    :  TEXT NEWLINE list NEWLINE TEXT list NEWLINE TEXT list
    ;

grid
    :   (TEXT NEWLINE (linePrice NEWLINE)+)+
    ;

list
    :   (NEWLINE NUMBER)+
    ;
    
outcomePrice
    :   TEXT NEWLINE NUMBER
    ;
    
linePrice
    :   line NEWLINE NUMBER
    |   SCORE NEWLINE NUMBER
    ;
    
line
    :   SINGLE_LINE
    |   SPLIT_LINE
    ;

/*
 * Lexer Rules
 */
 
fragment DIGIT 
    :   [0-9]
    ;
    
fragment CHARACTER
    :   [ \u00a0\u00c0-\u01ffa-zA-Z'\-0-9]
    ;

SPLIT_LINE
    :   SINGLE_LINE ',' SINGLE_LINE
    ;
    
SINGLE_LINE
    :   [+\-] NUMBER
    ;
    
SCORE
    :   DIGIT+ '-' DIGIT+
    ;
    
NUMBER         
    :   DIGIT+ (.DIGIT+)? 
    ;
    
TEXT
    :   CHARACTER+
    ;
    
NEWLINE
    :   '\n'
    ;
    
WS
    :   [ \t\r]+ -> skip
    ;

I am testing against the following text,

Kodagu FC
-1.0,-1.5
1.575
-1.5
1.725
-1.5,-2.0
1.925
-2
2.050
-2.0,-2.5
2.300
Rebels FC
+1.0,+1.5
2.250
+1.5
1.975
+1.5,+2.0
1.875
+2
1.675
+2.0,+2.5
1.550

For some reason suddenly a number is not being recognised, I expect the pattern to continue and the 'line' to break/exit at the '\n' character. Why is this the case and are there any tools to easily discover this?

enter image description here


Solution

  • :   DIGIT+ (.DIGIT+)? 
    

    . means "any character". The above pattern matches either a sequence of digits or two sequences of digits with any character in between. So this will match "12.34", but it can also match "12,34" or, yes, "12\n34".

    Put the dot inside quotes so it only matches literal dots.