Search code examples
pythoniteratorpyparsing

pyparsing retrieve resulting keys for strings of dynamic length


I try to parse a list of groups in pyparsing. These groups may be of different type and I would like to retrieve the type in my result. As there may be multiple groups of the same type a dictionary does not help. To illustrate my problem I give a minimal example:

import pyparsing as pars

dot = pars.Literal(".")
question = pars.Literal("?")
comma = pars.Literal(",")

total = pars.OneOrMore(
    pars.Group(
        pars.OneOrMore(dot)("dot")
        | pars.OneOrMore(question)("question")
    )
    + pars.Optional(comma)
)

result = total.parseString("...,?????,..,??")

So a sequence of dots form a group and a sequence of questionmarks form a group. Thus I named these groups dot and question. The resulting dictionary however

In: result.asDict()
Out: {}

If I print this as XML:

<ITEM>
  <dot>
    <dot>.</dot>
    <ITEM>.</ITEM>
    <ITEM>.</ITEM>
  </dot>
  <ITEM>,</ITEM>
  <question>
    <question>?</question>
    <ITEM>?</ITEM>
    <ITEM>?</ITEM>
    <ITEM>?</ITEM>
    <ITEM>?</ITEM>
  </question>
  <ITEM>,</ITEM>
  <dot>
    <dot>.</dot>
    <ITEM>.</ITEM>
  </dot>
  <ITEM>,</ITEM>
  <question>
    <question>?</question>
    <ITEM>?</ITEM>
  </question>
</ITEM>

Apart from the weird naming of the sub-items, the group tag has the correct name. My question is, how can I iterate through this result without using this xml. I mean, .asList() drops the keys, .asDict() discards multiple items of the same type and .asXML() returns a String. Isn't there a way to obtain all tuples like so:

for k,v in GETMYTUPLES:
    print k, v
    -> dot [".", ".", "."]
    -> question ["?", ... and so forth

Solution

  • I generally discourage use of the asXML() method - it is deprecated and will probably go away in version 2.2. If you use dump() instead, you will see that what you have is a sequence of named groups, not a dict, so asDict(), which only gives output for the keyed values, has nothing to work with at the top level.

    print(result.dump())
    
    [['.', '.', '.'], ',', ['?', '?', '?', '?', '?'], ',', ['.', '.'], ',', ['?', '?']]
    [0]:
      ['.', '.', '.']
      - dot: ['.', '.', '.']
    [1]:
      ,
    [2]:
      ['?', '?', '?', '?', '?']
      - question: ['?', '?', '?', '?', '?']
    [3]:
      ,
    [4]:
      ['.', '.']
      - dot: ['.', '.']
    [5]:
      ,
    [6]:
      ['?', '?']
      - question: ['?', '?']
    

    To get each parsed bit, rather than calling asDict() or asList(), just iterate over the result directly. If you call asDict() on each of the list elements, you will see your named values:

    for r in result:
        if isinstance(r, pars.ParseResults):
            print(r.asDict())
    
    {'dot': ['.', '.', '.']}
    {'question': ['?', '?', '?', '?', '?']}
    {'dot': ['.', '.']}
    {'question': ['?', '?']}
    

    You can also use getName() for those sub elements:

    for r in result:
        if isinstance(r, pars.ParseResults):
            print(r, r.getName())
    
    ['.', '.', '.'] dot
    ['?', '?', '?', '?', '?'] question
    ['.', '.'] dot
    ['?', '?'] question
    

    EDIT

    Also, consider replacing:

    total = pars.OneOrMore(
        pars.Group(
            pars.OneOrMore(dot)("dot")
            | pars.OneOrMore(question)("question")
        )
        + pars.Optional(comma)
    )
    

    with

    total = delimitedList(pars.Group(pars.OneOrMore(dot)("dot") | 
                                     pars.OneOrMore(question)("question"))))
    

    When you have a list of things delimited by commas, the commas are usually there to help at parse time, but afterward, what you really want are just the things. delimitedList does that for you (comma is the default delimiter, but you can pass a different one as the optional delim parameter).