I'm currently in the process of implementing a dialect of prolog in python. I'm using the wonderful pyparsing
module for this purpose and I've found it to work very well for other projects involving context-free grammars.
As I'm moving into context-sensitive grammars, I'm gradually getting used to pyparsing
's style. pyparsing.nestedExpr
and pyparsing.delimitedList
are two things I'm still getting acquainted with. Right now I'm having trouble with pyparsing.delimitedList
; it achieves what I'm looking for, but each individual term
in the example code below is returned in a list and I haven't used pyparsing.Group
on any terms.
Refactoring to use pyparsing.nestedExpr
and pyparsing.infixNotation
are next on my TODOs after solving this problem, so please don't panic that I'm not using them yet. I also suspect, but don't yet know, that I'll have to prevent matches for term_list
on the left side of the rule expression. This is to say that the code is a work in progress and will see significant change over time as I experiment with the library further.
I think pyparsing.ungroup
can be used to solve the problem, but pyparsing.ungroup(pyparsing.delimitedList...
doesn't seem to have any effect in this case.
result = root.parseString('''
A :- True
Z :- 5
''')
print(result.dump())
print(result.rules[0].goals)
[['A', 'True'], ['Z', '5']]
- rules: [['A', 'True'], ['Z', '5']]
[0]:
['A', 'True']
- goals: [['True']]
[0]:
['True']
[1]:
['Z', '5']
- goals: [['5']]
[0]:
['5']
[['True']]
[['A', 'True'], ['Z', '5']]
- rules: [['A', 'True'], ['Z', '5']]
[0]:
['A', 'True']
- goals: ['True']
[1]:
['Z', '5']
- goals: ['5']
['True']
import pyparsing as pp
# These types are the language primitives
atom = pp.Word(pp.alphanums)
number = pp.Word(pp.nums)
variable = pp.Word(pp.alphanums)
string = pp.quotedString
# Terms are the basic unit of expression here
compound_term = pp.Forward()
term = (atom ^ number ^ variable ^ pp.Group(compound_term))('terms*')
# A compound term includes a few rules for term composition, such as lists or an atom containing arguments
term_list = pp.Forward()
compound_term <<= \
string ^ \
term_list ^ \
atom('functor') + pp.Suppress('(') + pp.delimitedList(term('arguments*')) + pp.Suppress(')')
term_list <<= pp.Suppress('[') + pp.delimitedList(term('items*')) + pp.Suppress(']')
# The rule operator is an infix operator represented by :-
# On the right side, multiple goals can be composed using AND or OR operators
rule = pp.Group(
term + pp.Suppress(':-') + \
pp.delimitedList(term('goals*')) \
)('rules*')
root = pp.ZeroOrMore(rule)
result = root.parseString(
'''
A :- True
Z :- 5
''')
print(result.dump())
print(result.rules[0].goals)
The initial problem is the presence of Group
in compound_term
:
term = (atom ^ number ^ variable ^ pp.Group(compound_term))('terms*')
should be
term = (atom ^ number ^ variable ^ (compound_term))('terms*')
After making that change, and adding a "lhs" results name in your rule (see below), I get this:
[['A', 'True'], ['Z', '5']]
- rules: [['A', 'True'], ['Z', '5']]
[0]:
['A', 'True']
- goals: ['True']
- lhs: 'A'
[1]:
['Z', '5']
- goals: ['5']
- lhs: 'Z'
['True']
Some added notes:
atom
is defined as
atom = pp.Word(pp.alphanums)
This will match "123" as an atom
also. To ensure that you just get variable names , use pp.Word(pp.alphas, pp.alphanums)
. This indicates that the initial letter must be an alpha, and any subsequent letters can be alpha or numeric (same for variable
).
I would not add the results name "terms*" on term, since it will end up getting used on both left and right hand sides of your ":-" operator. I recommend that people generally leave the attachment of results names until the expression is used in higher-level expressions. For instance, I would define rule as:
rule = pp.Group(term("rule_lhs")
+ ":-"
+ pp.delimitedList(term)("goals")
)
I wouldn't really call ":-" an "infix" operator, I consider operators like "+", "-", "AND", "OR" as infix operators. For instance, I don't think x :- y :- z
is valid. You'll probably do something like this to add your "AND" and "OR" operators:
logical_term_expression = pp.infixNotation(term,
[
("&&", 2, pp.opAssoc.LEFT,),
("||", 2, pp.opAssoc.LEFT,),
])
Having a results name in term
will really make a mess of this, more likely to use classes on your operator tuples, as you can see in the pyparsing examples like simple_bool.py.
You mentioned using nestedExpr
- please don't. That helper is best used when writing a scanner for something like C code, where you might want to just jump over some nested braces without actually parsing the contents. In your DSL, you will want to parse everything properly - but I think infixNotation
may be all you need.