Search code examples
pythonpyparsing

Weird early EOF termination of Forward() parser_element


After perusing and debugging several Forward() features in pyparsing examples, I've cobbled several of these feature sets together as needed for the ISC Bind9/DHCP configuration file:

  • Pushing/poppping an '!' symbol into the execStack
  • Forward()
  • Reusing parsing_common.ipv4_address

There is one EBNF (detailed in this Zytrax link) that I am struggling here:

address_match_list = element ; [ element; ... ]

element = [!] (ip [/prefix] | key key-name | "acl_name" | { address_match_list } )

My final (but failing best-fit) draft is:

element = Forward()
element <<= (
    # Hide the exclamation so we can do deeper parse cleaner w/o clutter of '!'
    (0, None) * Word('!') +

    # Might be nice to do a bit of lookahead for '.', ':', 'key', and '"'
    # | is matchFirst, not matchLongest
    # ^ is matchLongest
    (
        ZeroOrMore(
            (
                # Typical pattern "1.2.3.4/24;"
                (
                    Combine(
                        pyparsing_common.ipv4_address + '/' + Word(nums, max=3)
                    ) + ';'
                ) ^                                        # Start: '999.999.999.999/99'
                # Typical pattern "2.3.4.5;"
                (pyparsing_common.ipv4_address + ';') ^    # Start: '999.999.999.999'
                # Typical pattern "3210::1;"
                (pyparsing_common.ipv6_address + ';') ^    # Start: 'XXXX:'
                (Keyword('key') + Word(alphanums, max=63) + ';')
                                                           # Start: 'key <key-varname>'
            )
        ) ^
        # Typical pattern "{ 1.2.3.4; };"
        ZeroOrMore('{' - element + '}' + ';')
    ).setParseAction(pushFirst)
).setParseAction(pushExclamation)

And I ran the element.runTests():

element.runTests('2.2.2.2; { 3.3.3.3; };')
2.2.2.2; { 3.3.3.3; };
         ^
FAIL: Expected end of text, found '{'  (at char 9), (line:1, col:10)

the unexpected 'expected EOF' after matching first element is what is stopping the entire parser.

Working standalone snippet of code that demonstrates the problem.

#!/usr/bin/env python3
# EBNF detailed at http://www.zytrax.com/books/dns/ch7/address_match_list.html
from pyparsing import *
exprStack = []

def pushFirst(strg, loc, toks):
    exprStack.append(toks[0])

def pushExclamation(strg, loc, toks):
    for t in toks:
        if t == '!':
            exprStack.append('!')
        else:
            break

# Address_Match_List (AML)
# This AML combo is ordered very carefully so that longest pattern are tried firstly
#
# EBNF reiterated here:
#
#    address_match_list = element ; [ element; ... ]
#
#    element = [!] (ip [/prefix] | key key-name | "acl_name" | { address_match_list } )
#
element = Forward()
element <<= (
    # Hide the exclamation so we can do deeper parse cleaner w/o clutter of '!'
    (0, None) * Word('!') +

    # Might be nice to do a bit of lookahead for '.', ':', 'key', and '"'
    # | is matchFirst, not matchLongest
    # ^ is matchLongest
    (
        ZeroOrMore(
            (
                # Typical pattern "1.2.3.4/24;"
                (
                    Combine(
                        pyparsing_common.ipv4_address + '/' + Word(nums, max=3)
                    ) + ';'
                ) ^                                        # Start: '999.999.999.999/99'
                # Typical pattern "2.3.4.5;"
                (pyparsing_common.ipv4_address + ';') ^    # Start: '999.999.999.999'
                # Typical pattern "3210::1;"
                (pyparsing_common.ipv6_address + ';') ^    # Start: 'XXXX:'
                (
                    Keyword('key') + Word(alphanums, max=63) + ';'
                )                                          # Start: 'key <key-variable-name>'
            )
        ) ^
        # Typical pattern "{ 1.2.3.4; };"
        ZeroOrMore('{' + element + '}' + ';')
    ).setParseAction(pushFirst)
).setParseAction(pushExclamation)
element.setName('"element ;"')
element.setDebug()

result = element.runTests("""
123.123.123.123;
!210.210.210.210;
{ 234.234.234.234 };
2.2.2.2; { 3.3.3.3; };
{ 4.4.4.4; }; 5.5.5.5;
{ 6.6.6.6; 7.7.7.7; }; 8.8.8.8;
!{ 9.9.9.9; 10.10.10.10; };
12.12.12.12; !13.13.13.13;
14.14.14.14/15; 16.16.16.16; key MySha512Key;
17.17.17.17/18; { 19.19.19.19; }; key YourSha512Key; }
""")

import pprint
pp = pprint.PrettyPrinter(indent=4)
print("Result: ")
pp.pprint(result)

Test Run of Valid Syntax Contents

Complete element.runTests() output:


123.123.123.123;
['123.123.123.123', ';']

!210.210.210.210;
['!', '210.210.210.210', ';']

{ 234.234.234.234 };
^
FAIL: exception raised in parse action  (at char 0), (line:1, col:1)

2.2.2.2; { 3.3.3.3; };
         ^
FAIL: Expected end of text, found '{'  (at char 9), (line:1, col:10)

{ 4.4.4.4; }; 5.5.5.5;
              ^
FAIL: Expected end of text, found '5'  (at char 14), (line:1, col:15)

{ 6.6.6.6; 7.7.7.7; }; 8.8.8.8;
                       ^
FAIL: Expected end of text, found '8'  (at char 23), (line:1, col:24)

!{ 9.9.9.9; 10.10.10.10; };
['!', '{', '9.9.9.9', ';', '10.10.10.10', ';', '}', ';']

12.12.12.12; !13.13.13.13;
             ^
FAIL: Expected end of text, found '!'  (at char 13), (line:1, col:14)

14.14.14.14/15; 16.16.16.16; key MySha512Key;
['14.14.14.14/15', ';', '16.16.16.16', ';', 'key', 'MySha512Key', ';']

17.17.17.17/18; { 19.19.19.19; }; key YourSha512Key; }
                ^
FAIL: Expected end of text, found '{'  (at char 16), (line:1, col:17)

Pretty-printed result is:

Result: 
(   False,
    [   ('123.123.123.123;', (['123.123.123.123', ';'], {})),
        ('!210.210.210.210;', (['!', '210.210.210.210', ';'], {})),
        (   '{ 234.234.234.234 };',
            exception raised in parse action  (at char 0), (line:1, col:1)),
        (   '2.2.2.2; { 3.3.3.3; };',
            Expected end of text, found '{'  (at char 9), (line:1, col:10)),
        (   '{ 4.4.4.4; }; 5.5.5.5;',
            Expected end of text, found '5'  (at char 14), (line:1, col:15)),
        (   '{ 6.6.6.6; 7.7.7.7; }; 8.8.8.8;',
            Expected end of text, found '8'  (at char 23), (line:1, col:24)),
        (   '!{ 9.9.9.9; 10.10.10.10; };',
            (['!', '{', '9.9.9.9', ';', '10.10.10.10', ';', '}', ';'], {})),
        (   '12.12.12.12; !13.13.13.13;',
            Expected end of text, found '!'  (at char 13), (line:1, col:14)),
        (   '14.14.14.14/15; 16.16.16.16; key MySha512Key;',
            (['14.14.14.14/15', ';', '16.16.16.16', ';', 'key', 'MySha512Key', ';'], {})),
        (   '17.17.17.17/18; { 19.19.19.19; }; key YourSha512Key; }',
            Expected end of text, found '{'  (at char 16), (line:1, col:17))])

Process finished with exit code 0

I'm still debugging slowly on the 234.234.234.234; and 3.3.3.3;so I would hope that someone would glance and say 'there it is' while I slowly debug this.

Test Run of Purposely-Failed Syntax

UPDATED: Added test code of purposely failed syntax contents:

result = element.runTests("""
20
!
key;
21;
{ 23 };
{ 24.24.24.24 };
{ 25.25.25.25; }
26.26.26.26
27.27.27.27; key
28.28.28.28; { key }
29.29.29.29, 30.30.30.30;
{ 31.31.31.31; 32.32.32.32; }
{ 33.33.33.33; 34.34.34.34; }; 35;
""", failureTests=True)
print("Result of failed contents: ")
pp.pprint(result)

Test run of failed content (pretty-print-format):

Result of failed contents: 
(   True,
    [   ('20', exception raised in parse action  (at char 0), (line:1, col:1)),
        ('!', exception raised in parse action  (at char 0), (line:1, col:1)),
        (   'key;',
            exception raised in parse action  (at char 0), (line:1, col:1)),
        ('21;', exception raised in parse action  (at char 0), (line:1, col:1)),
        (   '{ 23 };',
            exception raised in parse action  (at char 0), (line:1, col:1)),
        (   '{ 24.24.24.24 };',
            exception raised in parse action  (at char 0), (line:1, col:1)),
        (   '{ 25.25.25.25; }',
            exception raised in parse action  (at char 0), (line:1, col:1)),
        (   '26.26.26.26',
            exception raised in parse action  (at char 0), (line:1, col:1)),
        (   '27.27.27.27; key',
            Expected end of text, found 'k'  (at char 13), (line:1, col:14)),
        (   '28.28.28.28; { key }',
            Expected end of text, found '{'  (at char 13), (line:1, col:14)),
        (   '29.29.29.29, 30.30.30.30;',
            exception raised in parse action  (at char 0), (line:1, col:1)),
        (   '{ 31.31.31.31; 32.32.32.32; }',
            exception raised in parse action  (at char 0), (line:1, col:1)),
        (   '{ 33.33.33.33; 34.34.34.34; }; 35;',
            Expected end of text, found '3'  (at char 31), (line:1, col:32))])

Process finished with exit code 0
Match "element ;" at loc 0(1,1)
Exception raised:exception raised in parse action  (at char 0), (line:1, col:1)

20
^
FAIL: exception raised in parse action  (at char 0), (line:1, col:1)
Match "element ;" at loc 0(1,1)
Exception raised:exception raised in parse action  (at char 0), (line:1, col:1)

!
^
FAIL: exception raised in parse action  (at char 0), (line:1, col:1)
Match "element ;" at loc 0(1,1)
Exception raised:exception raised in parse action  (at char 0), (line:1, col:1)

key;
^
FAIL: exception raised in parse action  (at char 0), (line:1, col:1)
Match "element ;" at loc 0(1,1)
Exception raised:exception raised in parse action  (at char 0), (line:1, col:1)

21;
^
FAIL: exception raised in parse action  (at char 0), (line:1, col:1)
Match "element ;" at loc 0(1,1)
Match "element ;" at loc 1(1,2)
Matched "element ;" -> []
Exception raised:exception raised in parse action  (at char 0), (line:1, col:1)

{ 23 };
^
FAIL: exception raised in parse action  (at char 0), (line:1, col:1)
Match "element ;" at loc 0(1,1)
Match "element ;" at loc 1(1,2)
Matched "element ;" -> []
Exception raised:exception raised in parse action  (at char 0), (line:1, col:1)

{ 24.24.24.24 };
^
FAIL: exception raised in parse action  (at char 0), (line:1, col:1)
Match "element ;" at loc 0(1,1)
Match "element ;" at loc 1(1,2)
Matched "element ;" -> ['25.25.25.25', ';']
Exception raised:exception raised in parse action  (at char 0), (line:1, col:1)

{ 25.25.25.25; }
^
FAIL: exception raised in parse action  (at char 0), (line:1, col:1)
Match "element ;" at loc 0(1,1)
Exception raised:exception raised in parse action  (at char 0), (line:1, col:1)

26.26.26.26
^
FAIL: exception raised in parse action  (at char 0), (line:1, col:1)
Match "element ;" at loc 0(1,1)
Matched "element ;" -> ['27.27.27.27', ';']

27.27.27.27; key
             ^
FAIL: Expected end of text, found 'k'  (at char 13), (line:1, col:14)
Match "element ;" at loc 0(1,1)
Matched "element ;" -> ['28.28.28.28', ';']

28.28.28.28; { key }
             ^
FAIL: Expected end of text, found '{'  (at char 13), (line:1, col:14)
Match "element ;" at loc 0(1,1)
Exception raised:exception raised in parse action  (at char 0), (line:1, col:1)

29.29.29.29, 30.30.30.30;
^
FAIL: exception raised in parse action  (at char 0), (line:1, col:1)
Match "element ;" at loc 0(1,1)
Match "element ;" at loc 1(1,2)
Matched "element ;" -> ['31.31.31.31', ';', '32.32.32.32', ';']
Exception raised:exception raised in parse action  (at char 0), (line:1, col:1)

{ 31.31.31.31; 32.32.32.32; }
^
FAIL: exception raised in parse action  (at char 0), (line:1, col:1)
Match "element ;" at loc 0(1,1)
Match "element ;" at loc 1(1,2)
Matched "element ;" -> ['33.33.33.33', ';', '34.34.34.34', ';']
Match "element ;" at loc 1(1,2)
Matched "element ;" -> ['33.33.33.33', ';', '34.34.34.34', ';']
Matched "element ;" -> ['{', '33.33.33.33', ';', '34.34.34.34', ';', '}', ';']

{ 33.33.33.33; 34.34.34.34; }; 35;
                               ^
FAIL: Expected end of text, found '3'  (at char 31), (line:1, col:

UPDATE: From the answer provided by Paul MacG, I've updated the snippet of code with his suggestion.

Before I get to that, I've found two more error in my two test runs (valid syntax and invalid syntax); both errors were in the valid syntax test run. I've updated the test snippet as:

result = element.runTests("""
123.123.123.123;
!210.210.210.210;
{ 234.234.234.234; };
2.2.2.2; { 3.3.3.3; };
{ 4.4.4.4; }; 5.5.5.5;
{ 6.6.6.6; 7.7.7.7; }; 8.8.8.8;
!{ 9.9.9.9; 10.10.10.10; };
12.12.12.12; !13.13.13.13;
14.14.14.14/15; 16.16.16.16; key MySha512Key;
17.17.17.17/18; { 19.19.19.19; }; key YourSha512Key;
""")
print("Result of valid contents: ")
pp.pprint(result)

Now the test result is narrowed down to just one failing syntax:

Result of valid contents: 
(   False,
    [   ('123.123.123.123;', (['123.123.123.123', ';'], {})),
        ('!210.210.210.210;', (['!', '210.210.210.210', ';'], {})),
        (   '{ 234.234.234.234; };',
            ([(['{', '234.234.234.234', ';', '}', ';'], {})], {})),
        (   '2.2.2.2; { 3.3.3.3; };',
            (['2.2.2.2', ';', (['{', '3.3.3.3', ';', '}', ';'], {})], {})),
        (   '{ 4.4.4.4; }; 5.5.5.5;',
            ([(['{', '4.4.4.4', ';', '}', ';'], {}), '5.5.5.5', ';'], {})),
        (   '{ 6.6.6.6; 7.7.7.7; }; 8.8.8.8;',
            ([(['{', '6.6.6.6', ';', '7.7.7.7', ';', '}', ';'], {}), '8.8.8.8', ';'], {})),
        (   '!{ 9.9.9.9; 10.10.10.10; };',
            (['!', (['{', '9.9.9.9', ';', '10.10.10.10', ';', '}', ';'], {})], {})),
        (   '12.12.12.12; !13.13.13.13;',
            Expected end of text, found '!'  (at char 13), (line:1, col:14)),
        (   '14.14.14.14/15; 16.16.16.16; key MySha512Key;',
            (['14.14.14.14/15', ';', '16.16.16.16', ';', 'key', 'MySha512Key', ';'], {})),
        (   '17.17.17.17/18; { 19.19.19.19; }; key YourSha512Key;',
            (['17.17.17.17/18', ';', (['{', '19.19.19.19', ';', '}', ';'], {}), 'key', 'YourSha512Key', ';'], {}))])

This is a major step forward.

I've noticed the following fundamental changes:

  • introduction of delimitedList()
  • the ZeroOrMore got consolidated within the Forward()

We are left with one error pertaining to an exclamation mark used in nested element.

import pprint
pp = pprint.PrettyPrinter(indent=4)
result = element.runTests("""
12.12.12.12; !13.13.13.13;
""")
print("Result of valid contents: ")
pp.pprint(result)

Test Result is:

Match "element ;" at loc 0(1,1)
Matched "element ;" -> ['12.12.12.12', ';']

12.12.12.12; !13.13.13.13;
             ^
FAIL: Expected end of text, found '!'  (at char 13), (line:1, col:14)
Result of valid contents: 
(   False,
    [   (   '12.12.12.12; !13.13.13.13;',
            Expected end of text, found '!'  (at char 13), (line:1, col:14))])

Final Run of a Working Solution

In the final test code, I've incorporated Paul McG's suggestion of pushing the exclamation parser_element to inside the ZeroOrMore as shown below:

# Address_Match_List (AML)
# This AML combo is ordered very carefully so that longest pattern are tried firstly
#
# EBNF reiterated here:
#
#    address_match_list = element ; [ element; ... ]
#
#    element = [!] (ip [/prefix] | key key-name | "acl_name" | { address_match_list } )
#
element = Forward()
element <<= (
    # Might be nice to do a bit of lookahead for '.', ':', 'key', and '"'
    # | is matchFirst, not matchLongest
    # ^ is matchLongest
    ZeroOrMore(
        # Hide the exclamation so we can do deeper parse cleaner w/o clutter of '!'
        (0, None) * Word('!') +
        (
                (
                        (Combine(pyparsing_common.ipv4_address + '/' + Word(nums, max=3)) + ';')
                        ^ (pyparsing_common.ipv4_address + ';')
                        ^ (pyparsing_common.ipv6_address + ';')
                        ^ (Keyword('key') + Word(alphanums, max=63) + ';')
                        ^ Keyword('acl_name')
                ).setParseAction(pushFirst)
                ^ Group('{' - delimitedList(element, delim=';') + '}' + ';')
        )
    )
).setParseAction(pushExclamation)
element.setName('"element ;"')
element.setDebug()

import pprint

pp = pprint.PrettyPrinter(indent=4)
result = element.runTests("""
123.123.123.123;
!210.210.210.210;
{ 234.234.234.234; };
2.2.2.2; { 3.3.3.3; };
{ 4.4.4.4; }; 5.5.5.5;
{ 6.6.6.6; 7.7.7.7; }; 8.8.8.8;
!{ 9.9.9.9; 10.10.10.10; };
12.12.12.12; !13.13.13.13;
14.14.14.14/15; 16.16.16.16; key MySha512Key;
17.17.17.17/18; { 19.19.19.19; }; key YourSha512Key;
""")
print("Result of valid contents: ")
pp.pprint(result)

As a result of the test run above, its test result of valid syntax contents is:

Result of valid contents: 
(   True,
    [   ('123.123.123.123;', (['123.123.123.123', ';'], {})),
        ('!210.210.210.210;', (['!', '210.210.210.210', ';'], {})),
        (   '{ 234.234.234.234; };',
            ([(['{', '234.234.234.234', ';', '}', ';'], {})], {})),
        (   '2.2.2.2; { 3.3.3.3; };',
            (['2.2.2.2', ';', (['{', '3.3.3.3', ';', '}', ';'], {})], {})),
        (   '{ 4.4.4.4; }; 5.5.5.5;',
            ([(['{', '4.4.4.4', ';', '}', ';'], {}), '5.5.5.5', ';'], {})),
        (   '{ 6.6.6.6; 7.7.7.7; }; 8.8.8.8;',
            ([(['{', '6.6.6.6', ';', '7.7.7.7', ';', '}', ';'], {}), '8.8.8.8', ';'], {})),
        (   '!{ 9.9.9.9; 10.10.10.10; };',
            (['!', (['{', '9.9.9.9', ';', '10.10.10.10', ';', '}', ';'], {})], {})),
        (   '12.12.12.12; !13.13.13.13;',
            (['12.12.12.12', ';', '!', '13.13.13.13', ';'], {})),
        (   '14.14.14.14/15; 16.16.16.16; key MySha512Key;',
            (['14.14.14.14/15', ';', '16.16.16.16', ';', 'key', 'MySha512Key', ';'], {})),
        (   '17.17.17.17/18; { 19.19.19.19; }; key YourSha512Key;',
            (['17.17.17.17/18', ';', (['{', '19.19.19.19', ';', '}', ';'], {}), 'key', 'YourSha512Key', ';'], {}))])

Wow. The answer below fixed the problem. Need to grapple it some more so I can give a better summary as to the "why".

Now it's easy skating to the filling out rest of ISC-style configuration.


Solution

  • This might get you closer, but I'm not sure it is doing the stack bits correctly.

    element = Forward()
    element <<= (
        # Hide the exclamation so we can do deeper parse cleaner w/o clutter of '!'
        (0, None) * Word('!') +
    
        # Might be nice to do a bit of lookahead for '.', ':', 'key', and '"'
        # | is matchFirst, not matchLongest
        # ^ is matchLongest
        ZeroOrMore(
            (
                (Combine(pyparsing_common.ipv4_address + '/' + Word(nums, max=3)) + ';')
                ^ (pyparsing_common.ipv4_address + ';')
                ^ (pyparsing_common.ipv6_address + ';')
                ^ (Keyword('key') + Word(alphanums, max=63) + ';')
                ^ Keyword('acl_name')
            ).setParseAction(pushFirst)
            ^ Group('{' - delimitedList(element, delim=';') + '}' + ';')
        )
    ).setParseAction(pushExclamation)
    

    I've started formatting my long expressions with the operator at the beginning of the next line, this feels more readable to me. I'm guessing you might want the elements in {}'s to be kept in their own subgroup, so I grouped them. And if you want to get rid of clutter, all those semicolons look like they could be suppressed, if you structure your results suitably.