Search code examples
pythonpyparsing

Issues debugging infix grammar in pyparsing


I wrote this script in order to parse statements with a syntax similar to prolog, treating connectives as operators with precedence:

import pyparsing as pyp

alphabet = "abcdefghijklmnopqrstuvwxyz"
alphabet = alphabet + alphabet.upper()

symbol = pyp.Word(alphabet)

predicate = symbol + "(" + pyp.ZeroOrMore(symbol + ",") + symbol + ")"

parenthetic = pyp.Forward()

pyp_formula = pyp.infixNotation((predicate | parenthetic),
       [
           (pyp.oneOf('~'), 1, pyp.opAssoc.RIGHT),
           (pyp.oneOf(','), 2, pyp.opAssoc.LEFT),
           (pyp.oneOf(';'), 2, pyp.opAssoc.LEFT),
           (pyp.oneOf('::'), 2, pyp.opAssoc.LEFT),
           (pyp.oneOf('->'), 2, pyp.opAssoc.LEFT),
           (pyp.oneOf(':-'), 2, pyp.opAssoc.LEFT),
           (pyp.oneOf('--'), 1, pyp.opAssoc.LEFT),
           (pyp.oneOf('.'), 1, pyp.opAssoc.LEFT),
       ])

parenthetic << "(" + pyp_formula + ")"

When I run

parse = pyp_formula.parseString('d(A, D), e(D) :- f(A), g(D); h(D).')

parse_list = formula_parse.asList()

print(parse_list)

I get that "d(A, D), e(D)" has not even been split into two predicates and "f(A), g(D); h(D)" is treated as a single terminal within a non-terminal.

[[[['d', '(', 'A', ',', 'D', ')', ',', 'e', '(', 'D', ')'],
':-', [['f', '(', 'A', ')', ',', 'g', '(', 'D', ')'],
';', 'h', '(', 'D', ')']], '.']]

I've tried several alternatives but I don't seem to be able to get the right parse.

Any help is welcome!


Solution

  • There is no need to define parenthetic, infixNotation does that for you. And if you pyp.Group your predicate expression, then the symbol and symbol args will get grouped for you. pyparsing also has the delimitedList helper as a short cut for expr, expr, expr: use pyp.delimitedList(expr). The default delimiter is ',', but you can define other delimiters. Lastly, with a list of operators this long, it is likely that you will have performance issues with any non-trivial parse - add a call to ParserElement.enablePackrat() to turn on an internal parsing cache.

    Here is how your code looks with these changes:

    import pyparsing as pyp
    pyp.ParserElement.enablePackrat()
    
    alphabet = "abcdefghijklmnopqrstuvwxyz"
    alphabet = alphabet + alphabet.upper()
    
    symbol = pyp.Word(alphabet)
    
    predicate = pyp.Group(symbol + "(" + pyp.delimitedList(symbol) + ")")
    
    pyp_formula = pyp.infixNotation(predicate,
           [
               (pyp.oneOf('~'), 1, pyp.opAssoc.RIGHT),
               (pyp.oneOf(','), 2, pyp.opAssoc.LEFT),
               (pyp.oneOf(';'), 2, pyp.opAssoc.LEFT),
               (pyp.oneOf('::'), 2, pyp.opAssoc.LEFT),
               (pyp.oneOf('->'), 2, pyp.opAssoc.LEFT),
               (pyp.oneOf(':-'), 2, pyp.opAssoc.LEFT),
               (pyp.oneOf('--'), 1, pyp.opAssoc.LEFT),
               (pyp.oneOf('.'), 1, pyp.opAssoc.LEFT),
           ])
    
    pyp_formula.runTests(
        '''
        d(A, D) -> h(D).
        d(A, D), e(D) :- f(A), g(D); h(D).
        '''
        )
    

    And the results:

    d(A, D) -> h(D).
    [[[['d', '(', 'A', 'D', ')'], '->', ['h', '(', 'D', ')']], '.']]
    [0]:
      [[['d', '(', 'A', 'D', ')'], '->', ['h', '(', 'D', ')']], '.']
      [0]:
        [['d', '(', 'A', 'D', ')'], '->', ['h', '(', 'D', ')']]
        [0]:
          ['d', '(', 'A', 'D', ')']
        [1]:
          ->
        [2]:
          ['h', '(', 'D', ')']
      [1]:
        .
    
    
    d(A, D), e(D) :- f(A), g(D); h(D).
    [[[[['d', '(', 'A', 'D', ')'], ',', ['e', '(', 'D', ')']], ':-', [[['f', '(', 'A', ')'], ',', ['g', '(', 'D', ')']], ';', ['h', '(', 'D', ')']]], '.']]
    [0]:
      [[[['d', '(', 'A', 'D', ')'], ',', ['e', '(', 'D', ')']], ':-', [[['f', '(', 'A', ')'], ',', ['g', '(', 'D', ')']], ';', ['h', '(', 'D', ')']]], '.']
      [0]:
        [[['d', '(', 'A', 'D', ')'], ',', ['e', '(', 'D', ')']], ':-', [[['f', '(', 'A', ')'], ',', ['g', '(', 'D', ')']], ';', ['h', '(', 'D', ')']]]
        [0]:
          [['d', '(', 'A', 'D', ')'], ',', ['e', '(', 'D', ')']]
          [0]:
            ['d', '(', 'A', 'D', ')']
          [1]:
            ,
          [2]:
            ['e', '(', 'D', ')']
        [1]:
          :-
        [2]:
          [[['f', '(', 'A', ')'], ',', ['g', '(', 'D', ')']], ';', ['h', '(', 'D', ')']]
          [0]:
            [['f', '(', 'A', ')'], ',', ['g', '(', 'D', ')']]
            [0]:
              ['f', '(', 'A', ')']
            [1]:
              ,
            [2]:
              ['g', '(', 'D', ')']
          [1]:
            ;
          [2]:
            ['h', '(', 'D', ')']
      [1]:
        .