I am using pyparsing
to write a small grammar, but I'm running into an issue where the definition of one variable requires me to use another variable that itself requires the first one, etc. This answer helped me some, since I am working with Prolog strings, but it didn't seem to address the issue I'm having.
import pyparsing as pp
# grammar
argument = pp.Word(pp.alphas)
pred = pp.Word(pp.alphas) + (pp.Suppress('(') + pp.delimitedList(argument) + pp.Suppress(')'))
# tests
test_string1="func(x, y)"
test_result1 = pred.parse_string(test_string1)
print(test_result1.dump())
import pyparsing as pp
# grammar
argument = pp.Word(pp.alphas) | pred
pred = pp.Word(pp.alphas) + (pp.Suppress('(') + pp.delimitedList(argument) + pp.Suppress(')'))
# tests
test_string2="func(func(x), y)"
test_result2 = pred.parse_string(test_string2)
print(test_result2.dump())
The idea is that a predication might have as one of its arguments another predication which itself might have more arguments or predications etc. However, in the first line of the grammar in the broken example, the code breaks because it is trying to reference pred
, which hasn't been assigned yet. I can switch the order, but the problem will persist, because pred
requires the usage of argument
since a pred
can have an argument
.
On paper I think it should work, since eventually it would terminate, but because of the fact that I have to declare things in a certain order it doesn't seem to work in code form.
This is a common issue with grammars, where there is some recursion in the definition of things.
I often encourage parser developers to take a moment and write up a BNF before diving into writing code. This is good practice whether you are using pyparsing or any other parsing library. Here is one that I came up with, based on your code:
BNF:
pred ::= identifier '(' args ')'
args ::= argument [, argument]...
argument ::= identifier | pred
identifier ::= word of one or more alphas
(Jumping ahead, pyparsing will create a railroad diagram of the BNF for you.)
Now you can do a mental walkthrough of your test string using this BNF. Looking at your string "func(func(x), y)", I have some questions about this BNF:
identifier | pred
may cause some ambiguity when parsing with pyparsing, since pred
also starts with an identifier
. We will either need to use '^' operator for pyparsing (which does a longest match), or reorder to match pred | identifier
so that the more complex expression is tried before the simpler one.You rightly determine that there is recursion in this grammar. Pyparsing lets you define a "I need to use this now but I'll define it later" term using the Forward
class (from the "forward" declaration in Pascal).
pred = pp.Forward()
Then translating this to pyparsing (working bottom-up) looks like:
identifier = pp.Word(pp.alphas)
argument = pred | identifier
args = pp.delimitedList(argument)
And then to insert the definition into the existing pred
term, use the <<=
operator instead of =
:
pred <<= identifier + pp.Suppress("(") + args + pp.Suppress(")")
I got so tired of writing test loops for test inputs to parsers that I wrote the run_tests() method, which you would call like this:
pred.run_tests("""\
func(func(x), y)
""")
Which echoes the input string and then dumps the output:
func(func(x), y)
['func', 'func', 'x', 'y']
So a couple more notes:
To make the arguments optional, wrap in a pp.Opt()
To preserve structure, wrap logical groups of terms in pp.Group. I would suggest wrapping both pred
and the '('+args+')' part of pred
:
pred <<= pp.Group(identifier + pp.Group(pp.Suppress("(") + args + pp.Suppress(")")))
Which will then give this run_tests output:
func(func(x), y)
[['func', [['func', ['x']], 'y']]]
[0]:
['func', [['func', ['x']], 'y']]
[0]:
func
[1]:
[['func', ['x']], 'y']
[0]:
['func', ['x']]
[0]:
func
[1]:
['x']
[1]:
y