Search code examples
pyparsing

How do I correctly name ParseResults?


I like to name the entities in my grammar so I can access them using the as_dict() feature of ParseResults. But somehow it is not obvious to me where exactly I should "group" and "name" them. This often results in some kind of trial and error process.

To make more clear what I mean I tried to strip down the problem to a minimal example:

If we define an identifier that is labelled with "I" and holds the name of the identifier:

from  pyparsing import *

identifier = Word(alphas,nums)
gid        = Group(identifier("I"))
idg        = Group(identifier)("I")

t=gid.parseString("x1")
print(t.as_dict(), t.as_list())
t=idg.parseString("x1")
print(t.as_dict(), t.as_list())

results in:

{} [['x1']]
{'I': ['x1']} [['x1']]

which suggests that I should first "Group" then "name" the identifier.

However if I use a sequence of these (named "P") it's vice versa, as this (continued) example shows:

prog= [
    Group(ZeroOrMore(gid)).setResultsName("P"),
    Group(ZeroOrMore(idg)).setResultsName("P"),
]

s = "x1 x2"

for i in range(0,len(prog)):
    t=prog[i].parseString(s)
    print(t.as_dict(), t.as_list())
    for v in t.P:
        print(v.as_dict(), t.as_list())

which outputs:

{'P': [{'I': 'x1'}, {'I': 'x2'}]} [[['x1'], ['x2']]]
{'I': 'x1'} [[['x1'], ['x2']]]
{'I': 'x2'} [[['x1'], ['x2']]]
{'P': {'I': ['x2']}} [[['x1'], ['x2']]]
{} [[['x1'], ['x2']]]
{} [[['x1'], ['x2']]]

Am I doing something wrong? Or did I just misunderstand named results?

Cheers, Alex


Solution

  • Welcome to pyparsing! Grouping and results names are really important features to get a good understanding of, for making parsers with useful results, so it's great that you are learning these basics.

    I had suggested using create_diagram() to better see the structure and the names for these expressions. But they are almost too simple for the diagrams to really show much. As you work with pyparsing further, you might come back to using create_diagram to make parser railroad diagrams for your pyparsing parsers.

    Instead, I replicated your steps, but instead of using results.as_dict() and results.as_list() (where results is the pyparsing ParseResults value returned from calling parse_string()), I used another visualizing method, results.dump(). dump() prints out results.as_list(), followed by an indented list of the items by results name, and then by sub-lists. I think dump() will show a little better how names and groups work in your expressions.

    One of the main points is that as_dict() will only walk named items. If you had an expression for two identifiers like this (where only one expression has a results name:

    two_idents = identifier() + identifier("final")
    

    Then print(two_idents.parse_string("x1 x2").as_list()) will print:

    ['x1', 'x2']
    

    But print(two_idents.parse_string("x1 x2").as_dict()) will only show:

    {"final": "x2"}
    

    because only the second item has a name. (This would even be the case if the unnamed item was a group containing a sub-expression with a results name. as_dict() only walks items with results names, so the unnamed containing group would be omitted.)

    Here's how dump() would display these:

    ['x1', 'x2']
    - final: 'x2'
    

    It shows that a list view of the results has 'x1' and 'x2', and there is a top-level results name 'final' that points to 'x2'.

    Here is my annotated version of your code, and the corresponding as_dict() and dump() output from each:

    from pyparsing import *
    
    identifier = Word(alphas, nums)
    
    # group an expression that has a results name
    gid = Group(identifier("I"))
    
    # group an unnamed expression, and put the results name on the group
    idg = Group(identifier)("I")
    
    # groups with the results name "P" on the outer group
    prog0 = Group(ZeroOrMore(gid)).setResultsName("P")
    prog1 = Group(ZeroOrMore(idg)).setResultsName("P")
    
    # pyparsing short-cut for x.set_name("x") for gid, idg, prog0, and prog1
    autoname_elements()
    
    s = "x1 x2"
    for expr in (gid, idg, prog0, prog1):
        print(expr)  # prints the expression name
        result = expr.parse_string(s)
        print(result.as_dict())
        print(result.dump())
        print()
    

    Gives this output:

    gid
    {}
    [['x1']]
    [0]:
      ['x1']
      - I: 'x1'
    
    idg
    {'I': ['x1']}
    [['x1']]
    - I: ['x1']
    [0]:
      ['x1']
    
    prog0
    {'P': [{'I': 'x1'}, {'I': 'x2'}]}
    [[['x1'], ['x2']]]
    - P: [['x1'], ['x2']]
      [0]:
        ['x1']
        - I: 'x1'
      [1]:
        ['x2']
        - I: 'x2'
    [0]:
      [['x1'], ['x2']]
      [0]:
        ['x1']
        - I: 'x1'
      [1]:
        ['x2']
        - I: 'x2'
    
    prog1
    {'P': {'I': ['x2']}}
    [[['x1'], ['x2']]]
    - P: [['x1'], ['x2']]
      - I: ['x2']
      [0]:
        ['x1']
      [1]:
        ['x2']
    [0]:
      [['x1'], ['x2']]
      - I: ['x2']
      [0]:
        ['x1']
      [1]:
        ['x2']
    

    Explanations:

    • gid is an unnamed group containing a named item. Since there is no top-level named item, as_dict() returns an empty dict.

    • idg is a named group containing an unnamed item. as_dict() returns a dict with the outer with the single item 'x1'

    • prog0 is 0 or more unnamed groups contained in a named group. Each of the contained groups has a named item.

    • prog1 is 0 or more named groups contained in a named group. Since the named groups all have the same results name, only the last one is kept in the results - this is similar to creating a Python dict using the same key multiple times. print({'a':100, 'a':200}) will print {'a': 200}. You can override this default behavior in pyparsing by adding list_all_matches=True argument to your call to set_results_name. Using list_all_matches=True makes the result act like a defaultdict(list) instead of a dict.

    Please visit the pyparsing docs at https://pyparsing-docs.readthedocs.io/en/latest/ and some additional tips in the pyparsing wiki at https://github.com/pyparsing/pyparsing/wiki .