Search code examples
pythonrecursionpyparsingebnf

Recursion in Pyparsing


Im unable to translate this EBNF expression into Pyparsing, any idea?

token:: [A-Z]
P:: !|token;P|(P^P)|(P*P)

The problem is when using recursion, the interpreter fails. Expression like this should be valid:

(ASD;!^FFF;!)
A;B;C;!
(((A;!^B;!)^C;D;!)*E;!)

Solution

  • To build a recursive grammar with Pyparsing, you have to think a little inside-out, using pyparsing's Forward class. With Forward, you define an empty placeholder for an expression to be defined later. Here is a start at pyparsing for this BNF:

    EXCLAM,SEMI,HAT,STAR = map(Literal,"!;^*")
    LPAR,RPAR = map(Suppress,"()")
    token = oneOf(list(alphas.upper()))
    

    I'm using Literal for defining your operators, but suppressing the grouping ()'s, we'll use pyparsing Group to physically group the results into sublists.

    Now we define the placeholder expression with Forward:

    expr = Forward()
    

    And now we can build the expression using this placeholder (we have to use '<<=' as the assignment operator so that expr is maintained as a Forward, and not rebound to the expression itself). Here is my first pass, using your BNF as-is:

    expr <<= (EXCLAM | 
              token + SEMI + expr | 
              Group(LPAR + expr + HAT + expr + RPAR) | 
              Group(LPAR + expr + STAR + expr + RPAR))
    

    This gives these results:

    (ASD;!^FFF;!)
      ^
    Expected ";" (at char 2), (line:1, col:3)
    
    A;B;C;!
    ['A', ';', 'B', ';', 'C', ';', '!']
    
    (((A;!^B;!)^C;D;!)*E;!)
    [[[['A', ';', '!', '^', 'B', ';', '!'], '^', 'C', ';', 'D', ';', '!'], '*', 'E', ';', '!']]
    

    It seems there is an unwritten rule in your BNF, that one or more tokens together can be present also, easily fixed as:

    expr <<= (EXCLAM | 
              OneOrMore(token) + SEMI + expr | 
              Group(LPAR + expr + HAT + expr + RPAR) | 
              Group(LPAR + expr + STAR + expr + RPAR))
    

    Now giving:

    (ASD;!^FFF;!)
    [['A', 'S', 'D', ';', '!', '^', 'F', 'F', 'F', ';', '!']]
    
    A;B;C;!
    ['A', ';', 'B', ';', 'C', ';', '!']
    
    (((A;!^B;!)^C;D;!)*E;!)
    [[[['A', ';', '!', '^', 'B', ';', '!'], '^', 'C', ';', 'D', ';', '!'], '*', 'E', ';', '!']]
    

    But it looks like we could benefit from additional grouping, so that the operands for the binary '^' and '*' operators are more clearly grouped. So I settled on:

    expr <<= (EXCLAM | 
              Group(OneOrMore(token) + SEMI + ungroup(expr)) | 
              Group(LPAR + expr + HAT + expr + RPAR) | 
              Group(LPAR + expr + STAR + expr + RPAR) )
    

    And I think this version of the output will be more easily processed now:

    (ASD;!^FFF;!)
    [[['A', 'S', 'D', ';', '!'], '^', ['F', 'F', 'F', ';', '!']]]
    
    A;B;C;!
    [['A', ';', 'B', ';', 'C', ';', '!']]
    
    (((A;!^B;!)^C;D;!)*E;!)
    [[[[['A', ';', '!'], '^', ['B', ';', '!']], '^', ['C', ';', 'D', ';', '!']], '*', ['E', ';', '!']]]
    

    Here is the complete script:

    from pyparsing import *
    
    EXCLAM,SEMI,HAT,STAR = map(Literal,"!;^*")
    LPAR,RPAR = map(Suppress,"()")
    token = oneOf(list(alphas.upper()))
    expr = Forward()
    expr <<= (EXCLAM | 
              Group(OneOrMore(token) + SEMI + ungroup(expr)) | 
              Group(LPAR + expr + HAT + expr + RPAR) | 
              Group(LPAR + expr + STAR + expr + RPAR) )
    
    tests = """\
    (ASD;!^FFF;!)
    A;B;C;!
    (((A;!^B;!)^C;D;!)*E;!)""".splitlines()
    
    for t in tests:
        print t
        try:
            print expr.parseString(t).dump()
        except ParseException as pe:
            print ' '*pe.loc + '^'
            print pe
        print
    

    Last note: I assumed that "AAA" was 3 successive 'A' tokens. If you meant for tokens to be word groupings of 1 or more alphas, then change 'OneOrMore(token)' in the expression to 'Word(alphas.upper())' - then you'll get this result for your first test case:

    [[['ASD', ';', '!'], '^', ['FFF', ';', '!']]]