Search code examples
pythonpyparsing

How can I unwrap results from pyparsing helper functions?


I'm currently in the process of implementing a dialect of prolog in python. I'm using the wonderful pyparsing module for this purpose and I've found it to work very well for other projects involving context-free grammars.

As I'm moving into context-sensitive grammars, I'm gradually getting used to pyparsing's style. pyparsing.nestedExpr and pyparsing.delimitedList are two things I'm still getting acquainted with. Right now I'm having trouble with pyparsing.delimitedList; it achieves what I'm looking for, but each individual term in the example code below is returned in a list and I haven't used pyparsing.Group on any terms.

Refactoring to use pyparsing.nestedExpr and pyparsing.infixNotation are next on my TODOs after solving this problem, so please don't panic that I'm not using them yet. I also suspect, but don't yet know, that I'll have to prevent matches for term_list on the left side of the rule expression. This is to say that the code is a work in progress and will see significant change over time as I experiment with the library further.

I think pyparsing.ungroup can be used to solve the problem, but pyparsing.ungroup(pyparsing.delimitedList... doesn't seem to have any effect in this case.

Output Logic

result = root.parseString('''
A :- True
Z :- 5
''')
print(result.dump())
print(result.rules[0].goals)

Results

[['A', 'True'], ['Z', '5']]
- rules: [['A', 'True'], ['Z', '5']]
  [0]:
    ['A', 'True']
    - goals: [['True']]
      [0]:
        ['True']
  [1]:
    ['Z', '5']
    - goals: [['5']]
      [0]:
        ['5']
[['True']]

Expected Results

[['A', 'True'], ['Z', '5']]
- rules: [['A', 'True'], ['Z', '5']]
  [0]:
    ['A', 'True']
    - goals: ['True']
  [1]:
    ['Z', '5']
    - goals: ['5']
['True']

Full Code

import pyparsing as pp

# These types are the language primitives
atom = pp.Word(pp.alphanums)
number = pp.Word(pp.nums)
variable = pp.Word(pp.alphanums)
string = pp.quotedString

# Terms are the basic unit of expression here
compound_term = pp.Forward()
term = (atom ^ number ^ variable ^ pp.Group(compound_term))('terms*')

# A compound term includes a few rules for term composition, such as lists or an atom containing arguments
term_list = pp.Forward()
compound_term <<= \
string ^ \
term_list ^ \
atom('functor') + pp.Suppress('(') + pp.delimitedList(term('arguments*')) + pp.Suppress(')')

term_list <<= pp.Suppress('[') + pp.delimitedList(term('items*')) + pp.Suppress(']')

# The rule operator is an infix operator represented by :-
# On the right side, multiple goals can be composed using AND or OR operators
rule = pp.Group(
    term + pp.Suppress(':-') + \
    pp.delimitedList(term('goals*')) \
    )('rules*')

root = pp.ZeroOrMore(rule)

result = root.parseString(
    '''
    A :- True
    Z :- 5
    ''')
print(result.dump())
print(result.rules[0].goals)

Solution

  • The initial problem is the presence of Group in compound_term:

    term = (atom ^ number ^ variable ^ pp.Group(compound_term))('terms*')
    

    should be

    term = (atom ^ number ^ variable ^ (compound_term))('terms*')
    

    After making that change, and adding a "lhs" results name in your rule (see below), I get this:

    [['A', 'True'], ['Z', '5']]
    - rules: [['A', 'True'], ['Z', '5']]
      [0]:
        ['A', 'True']
        - goals: ['True']
        - lhs: 'A'
      [1]:
        ['Z', '5']
        - goals: ['5']
        - lhs: 'Z'
    ['True']
    

    Some added notes:

    1. atom is defined as

      atom = pp.Word(pp.alphanums)
      

      This will match "123" as an atom also. To ensure that you just get variable names , use pp.Word(pp.alphas, pp.alphanums). This indicates that the initial letter must be an alpha, and any subsequent letters can be alpha or numeric (same for variable).

    2. I would not add the results name "terms*" on term, since it will end up getting used on both left and right hand sides of your ":-" operator. I recommend that people generally leave the attachment of results names until the expression is used in higher-level expressions. For instance, I would define rule as:

      rule = pp.Group(term("rule_lhs") 
                      + ":-" 
                      + pp.delimitedList(term)("goals") 
                      )
      
    3. I wouldn't really call ":-" an "infix" operator, I consider operators like "+", "-", "AND", "OR" as infix operators. For instance, I don't think x :- y :- z is valid. You'll probably do something like this to add your "AND" and "OR" operators:

      logical_term_expression = pp.infixNotation(term,
                  [
                  ("&&", 2, pp.opAssoc.LEFT,),
                  ("||", 2, pp.opAssoc.LEFT,),
                  ])
      

      Having a results name in term will really make a mess of this, more likely to use classes on your operator tuples, as you can see in the pyparsing examples like simple_bool.py.

    4. You mentioned using nestedExpr - please don't. That helper is best used when writing a scanner for something like C code, where you might want to just jump over some nested braces without actually parsing the contents. In your DSL, you will want to parse everything properly - but I think infixNotation may be all you need.