I have a contrived example of a problem that I am facing:
import pyparsing as pp
fname = pp.OneOrMore( pp.Word("Max") ).setResultsName("fname")
mname = pp.OneOrMore(s pp.Word("Joseph") ).setResultsName("mname")
lname = pp.OneOrMore( pp.Word("Andrews") ).setResultsName("lname")
another_mname = pp.OneOrMore(pp.Word("Miller")).setResultsName("mname")
full = fname + mname + lname + another_mname
output = full.parseString("Max Joseph Joseph Andrews Miller Miller").asDict()
# current output
{'fname': ['Max'], 'lname': ['Andrews'], 'mname': ['Miller', 'Miller']}
It is obvious why the output is the way it is. However, I would like also to collect 'Joseph' as another value. e.g.
# desired output
{'fname': ['Max'], 'lname': ['Andrews'], 'mname': ['Joseph', 'Joseph', 'Miller', 'Miller']}
Thanks.
Your code does not work because you set the name of the results to be same.
This cause the "mname"
entry associated to mname
in the resulting dict()
to be replaced by the "mname"
entry associated to another_mname
.
One way workaround to this would be to collect the names into two separate results and join those afterwards:
import pyparsing as pp
fname = pp.OneOrMore(pp.Word("Max"))("fname")
mname = pp.OneOrMore(pp.Word("Joseph"))("mname")
lname = pp.OneOrMore(pp.Word("Andrews"))("lname")
another_mname = pp.OneOrMore(pp.Word("Miller"))("mname2")
full = fname + mname + lname + another_mname
output = full.parseString("Max Joseph Joseph Andrews Miller Miller").asDict()
print(output)
# {'fname': ['Max'], 'mname': ['Joseph', 'Joseph'], 'lname': ['Andrews'], 'mname2': ['Miller', 'Miller']}
# clean-up dict
output['mname'] = output['mname'] + output['mname2']
del output['mname2']
print(output)
# {'fname': ['Max'], 'mname': ['Joseph', 'Joseph', 'Miller', 'Miller'], 'lname': ['Andrews']}
Note that you cannot simply define mname
to be:
mname = pp.OneOrMore(pp.Word("Joseph") | pp.Word("Miller"))("mname")
This would lead to a similar issue:
import pyparsing as pp
fname = pp.OneOrMore(pp.Word("Max"))("fname")
mname = pp.OneOrMore(pp.Word("Joseph") | pp.Word("Miller"))("mname")
lname = pp.OneOrMore(pp.Word("Andrews"))("lname")
full = fname + mname + lname + mname
output = full.parseString("Max Joseph Joseph Andrews Miller Miller").asDict()
print(output)
# {'fname': ['Max'], 'mname': ['Miller', 'Miller'], 'lname': ['Andrews']}
but for a different reason: now the mname
at the end of full
is replacing the previous value of mname
.
One could also automatize this, e.g.
import pyparsing as pp
fname = pp.OneOrMore(pp.Word("Max"))("fname")
mname = pp.OneOrMore(pp.Word("Joseph"))("mname:0")
lname = pp.OneOrMore(pp.Word("Andrews"))("lname")
another_mname = pp.OneOrMore(pp.Word("Miller"))("mname:1")
full = fname + mname + lname + another_mname
output = full.parseString("Max Max Joseph Joseph Andrews Miller Miller").asDict()
print(output)
# {'fname': ['Max', 'Max'], 'mname:0': ['Joseph', 'Joseph'], 'lname': ['Andrews'], 'mname:1': ['Miller', 'Miller']}
def quench(pp_dict, mapping=lambda k: k.split(':')[0]):
result = {}
to_remove = []
for k, v in pp_dict.items():
new_k = mapping(k)
if k != new_k:
if new_k not in result:
result[new_k] = []
result[new_k].extend(v)
else:
result[k] = v
return result
print(quench(output))
# {'fname': ['Max', 'Max'], 'mname': ['Joseph', 'Joseph', 'Miller', 'Miller'], 'lname': ['Andrews']}
Or, even more mindlessly by preprocessing full
by automatically converting multiple "mname"
instances to numbered ones (e.g. "mname:0"
) to be quenched later.
(as pointed out by @PaulMcG)
This mechanism is implemented in pyparsing
directly:
import pyparsing as pp
fname = pp.OneOrMore(pp.Word("Max")).setResultsName("fname")
mname = pp.OneOrMore(pp.Word("Joseph")).setResultsName("mname", listAllMatches=True)
lname = pp.OneOrMore(pp.Word("Andrews")).setResultsName("lname")
another_mname = pp.OneOrMore(pp.Word("Miller")).setResultsName("mname", listAllMatches=True)
full = fname + mname + lname + another_mname
output = full.parseString("Max Joseph Joseph Andrews Miller Miller").asDict()
print(output)
# {'fname': ['Max'], 'mname': [['Joseph', 'Joseph'], ['Miller', 'Miller']], 'lname': ['Andrews']}
or even like this:
import pyparsing as pp
fname = pp.OneOrMore(pp.Word("Max")).setResultsName("fname")
mname = pp.OneOrMore(pp.Word("Joseph") | pp.Word("Miller")).setResultsName("mname", listAllMatches=True)
lname = pp.OneOrMore(pp.Word("Andrews")).setResultsName("lname")
full = fname + mname + lname + mname
output = full.parseString("Max Joseph Joseph Andrews Miller Miller").asDict()
print(output)
# {'fname': ['Max'], 'mname': [['Joseph', 'Joseph'], ['Miller', 'Miller']], 'lname': ['Andrews']}
although the result is a list
of list
s and not a single flattened one.