I try to parse a list of groups in pyparsing
. These groups may be of different type and I would like to retrieve the type in my result. As there may be multiple groups of the same type a dictionary does not help.
To illustrate my problem I give a minimal example:
import pyparsing as pars
dot = pars.Literal(".")
question = pars.Literal("?")
comma = pars.Literal(",")
total = pars.OneOrMore(
pars.Group(
pars.OneOrMore(dot)("dot")
| pars.OneOrMore(question)("question")
)
+ pars.Optional(comma)
)
result = total.parseString("...,?????,..,??")
So a sequence of dots form a group and a sequence of questionmarks form a group. Thus I named these groups dot
and question
.
The resulting dictionary however
In: result.asDict()
Out: {}
If I print this as XML
:
<ITEM>
<dot>
<dot>.</dot>
<ITEM>.</ITEM>
<ITEM>.</ITEM>
</dot>
<ITEM>,</ITEM>
<question>
<question>?</question>
<ITEM>?</ITEM>
<ITEM>?</ITEM>
<ITEM>?</ITEM>
<ITEM>?</ITEM>
</question>
<ITEM>,</ITEM>
<dot>
<dot>.</dot>
<ITEM>.</ITEM>
</dot>
<ITEM>,</ITEM>
<question>
<question>?</question>
<ITEM>?</ITEM>
</question>
</ITEM>
Apart from the weird naming of the sub-items, the group tag has the correct name. My question is, how can I iterate through this result without using this xml. I mean, .asList()
drops the keys, .asDict()
discards multiple items of the same type and .asXML()
returns a String. Isn't there a way to obtain all tuples like so:
for k,v in GETMYTUPLES:
print k, v
-> dot [".", ".", "."]
-> question ["?", ... and so forth
I generally discourage use of the asXML()
method - it is deprecated and will probably go away in version 2.2. If you use dump()
instead, you will see that what you have is a sequence of named groups, not a dict, so asDict()
, which only gives output for the keyed values, has nothing to work with at the top level.
print(result.dump())
[['.', '.', '.'], ',', ['?', '?', '?', '?', '?'], ',', ['.', '.'], ',', ['?', '?']]
[0]:
['.', '.', '.']
- dot: ['.', '.', '.']
[1]:
,
[2]:
['?', '?', '?', '?', '?']
- question: ['?', '?', '?', '?', '?']
[3]:
,
[4]:
['.', '.']
- dot: ['.', '.']
[5]:
,
[6]:
['?', '?']
- question: ['?', '?']
To get each parsed bit, rather than calling asDict()
or asList()
, just iterate over the result directly. If you call asDict()
on each of the list elements, you will see your named values:
for r in result:
if isinstance(r, pars.ParseResults):
print(r.asDict())
{'dot': ['.', '.', '.']}
{'question': ['?', '?', '?', '?', '?']}
{'dot': ['.', '.']}
{'question': ['?', '?']}
You can also use getName()
for those sub elements:
for r in result:
if isinstance(r, pars.ParseResults):
print(r, r.getName())
['.', '.', '.'] dot
['?', '?', '?', '?', '?'] question
['.', '.'] dot
['?', '?'] question
EDIT
Also, consider replacing:
total = pars.OneOrMore(
pars.Group(
pars.OneOrMore(dot)("dot")
| pars.OneOrMore(question)("question")
)
+ pars.Optional(comma)
)
with
total = delimitedList(pars.Group(pars.OneOrMore(dot)("dot") |
pars.OneOrMore(question)("question"))))
When you have a list of things delimited by commas, the commas are usually there to help at parse time, but afterward, what you really want are just the things. delimitedList
does that for you (comma is the default delimiter, but you can pass a different one as the optional delim
parameter).