Search code examples
parsingpegtatsu

Failure to match character that matches rule with PEG parser


I'm trying to parse Java-style floating point numbers (accepting underscores in the middle of digits) and have simplified the grammar presented in the Java spec:

float_lit = [[DIGITS] '.'] DIGITS [FLOAT_EXP] [FLOAT_SUFFIX] ;

DIGITS = /\d[\d_]*\d/ | /\d/ ;
FLOAT_EXP = ( 'e' | 'E' ) [ '+' | '-' ] DIGITS ;
FLOAT_SUFFIX = 'f' | 'F' | 'd' | 'D' ;

Unfortunately, this doesn't accept the "1e10" input, weirdly failing to match the 'e' within FLOAT_EXP as shown in the trace below:

<float_lit ~1:1
1e10
<DIGITS<float_lit ~1:1
1e10
!'' /\d[\d_]*?\d/
1e10
>'1' /\d/
e10
>DIGITS<float_lit ~1:2
e10
!'.'
e10
<DIGITS<float_lit ~1:1
1e10
>DIGITS<float_lit ~1:2
e10
<FLOAT_EXP<float_lit ~1:2
e10
!'e'
e10
!'E'
e10
!FLOAT_EXP<float_lit ~1:2
e10
<FLOAT_SUFFIX<float_lit ~1:2
e10
!'f'
e10
!'F'
e10
!'d'
e10
!'D'
e10
!FLOAT_SUFFIX<float_lit ~1:2
e10
>float_lit ~1:2
e10
'1'

Can anyone point what I'm doing wrong?


Solution

  • The issue here was Tatsu's nameguard for tokens. Since the character following the token was alphanumeric, it doesn't match to prevent an eager consumption of tokens.

    The solution is using regexps instead of token choices to match these characters:

    float_lit    = [[DIGITS] '.'] DIGITS [FLOAT_EXP] [FLOAT_SUFFIX] ;
    DIGITS       = /\d[\d_]*\d/ | /\d/ ;
    FLOAT_EXP    = /[eE][+-]?/ DIGITS ;
    FLOAT_SUFFIX = /[fFdD]/ ;