Search code examples
tatsu

How to include a literal '#' in a Tatsu grammar?


I can't get Tatsu to parse a grammar that includes a literal '#'.

Here is a minimal example:

G = r'''
atom = /[0-9]+/
     | '#' atom
     ;
'''

p = tatsu.compile(G)
p.parse('#345', trace=True)

The parse throws a FailedParse exception. The trace seems to show that the parser is not matching the '#' literal:

<atom ~1:1
#345
!'' /[0-9]+/
!'#' 
!atom ~1:1
#345

If I change the grammar to use a symbol other than '#', it works fine. For example this works:

G1 = r'''
atom = /[0-9]+/
     | '@' atom
     ;
'''

tatsu.parse(G1, '@345')     --> ['@', '345']

Unfortunately, I can't change the format of the input data.


Solution

  • This is likely a bug in the version of TatSu you are using.

    If you need to stick to that version, please try including @@eol_comments :: // or a similar pattern in the grammar.

    This works for me:

    
    [ins] In [1]: import tatsu                                                                                      
    
    [ins] In [2]: G = r''' 
             ...: atom = /[0-9]+/ 
             ...:      | '#' atom 
             ...:      ; 
             ...: ''' 
             ...:  
             ...: p = tatsu.compile(G) 
             ...: p.parse('#345', trace=True)                                                                       
    ↙atom ~1:1
    #345
    ≢'' /[0-9]+/
    #345
    ≡'#' 
    345
    ↙atom↙atom ~1:2
    345
    ≡'345' /[0-9]+/
    ≡atom↙atom 
    ≡atom 
    Out[2]: ('#', '345')
    
    

    AFTERNOTE: Yes, the above output is from the master version of TatSu (sequences return tuple), but I just checked against v4.4.0, and it's equivalent.