asDict method in pyparsing overrides previous key

I have a contrived example of a problem that I am facing:

import pyparsing as pp

fname = pp.OneOrMore( pp.Word("Max") ).setResultsName("fname")
mname = pp.OneOrMore(s pp.Word("Joseph") ).setResultsName("mname")
lname = pp.OneOrMore( pp.Word("Andrews") ).setResultsName("lname")
another_mname = pp.OneOrMore(pp.Word("Miller")).setResultsName("mname")

full = fname + mname + lname + another_mname

output = full.parseString("Max Joseph Joseph Andrews Miller Miller").asDict()

# current output 
{'fname': ['Max'], 'lname': ['Andrews'], 'mname': ['Miller', 'Miller']}

It is obvious why the output is the way it is. However, I would like also to collect 'Joseph' as another value. e.g.

# desired output
{'fname': ['Max'], 'lname': ['Andrews'], 'mname': ['Joseph', 'Joseph', 'Miller', 'Miller']}

Thanks.

Solution

Your code does not work because you set the name of the results to be same. This cause the "mname" entry associated to mname in the resulting dict() to be replaced by the "mname" entry associated to another_mname.

One way workaround to this would be to collect the names into two separate results and join those afterwards:

import pyparsing as pp

fname = pp.OneOrMore(pp.Word("Max"))("fname")
mname = pp.OneOrMore(pp.Word("Joseph"))("mname")
lname = pp.OneOrMore(pp.Word("Andrews"))("lname")
another_mname = pp.OneOrMore(pp.Word("Miller"))("mname2")

full = fname + mname + lname + another_mname

output = full.parseString("Max Joseph Joseph Andrews Miller Miller").asDict()
print(output)
# {'fname': ['Max'], 'mname': ['Joseph', 'Joseph'], 'lname': ['Andrews'], 'mname2': ['Miller', 'Miller']}

# clean-up dict
output['mname'] = output['mname'] + output['mname2']
del output['mname2']

print(output)
# {'fname': ['Max'], 'mname': ['Joseph', 'Joseph', 'Miller', 'Miller'], 'lname': ['Andrews']}

Note that you cannot simply define mname to be:

mname = pp.OneOrMore(pp.Word("Joseph") | pp.Word("Miller"))("mname")

This would lead to a similar issue:

import pyparsing as pp

fname = pp.OneOrMore(pp.Word("Max"))("fname")
mname = pp.OneOrMore(pp.Word("Joseph") | pp.Word("Miller"))("mname")
lname = pp.OneOrMore(pp.Word("Andrews"))("lname")

full = fname + mname + lname + mname

output = full.parseString("Max Joseph Joseph Andrews Miller Miller").asDict()
print(output)
# {'fname': ['Max'], 'mname': ['Miller', 'Miller'], 'lname': ['Andrews']}

but for a different reason: now the mname at the end of full is replacing the previous value of mname.

One could also automatize this, e.g.

import pyparsing as pp

fname = pp.OneOrMore(pp.Word("Max"))("fname")
mname = pp.OneOrMore(pp.Word("Joseph"))("mname:0")
lname = pp.OneOrMore(pp.Word("Andrews"))("lname")
another_mname = pp.OneOrMore(pp.Word("Miller"))("mname:1")

full = fname + mname + lname + another_mname

output = full.parseString("Max Max Joseph Joseph Andrews Miller Miller").asDict()
print(output)
# {'fname': ['Max', 'Max'], 'mname:0': ['Joseph', 'Joseph'], 'lname': ['Andrews'], 'mname:1': ['Miller', 'Miller']}


def quench(pp_dict, mapping=lambda k: k.split(':')[0]):
    result = {}
    to_remove = []
    for k, v in pp_dict.items():
        new_k = mapping(k)
        if k != new_k:
            if new_k not in result:
                result[new_k] = []
            result[new_k].extend(v)
        else:
            result[k] = v
    return result


print(quench(output))
# {'fname': ['Max', 'Max'], 'mname': ['Joseph', 'Joseph', 'Miller', 'Miller'], 'lname': ['Andrews']}

Or, even more mindlessly by preprocessing full by automatically converting multiple "mname" instances to numbered ones (e.g. "mname:0") to be quenched later.

EDIT

(as pointed out by @PaulMcG)

This mechanism is implemented in pyparsing directly:

import pyparsing as pp

fname = pp.OneOrMore(pp.Word("Max")).setResultsName("fname")
mname = pp.OneOrMore(pp.Word("Joseph")).setResultsName("mname", listAllMatches=True)
lname = pp.OneOrMore(pp.Word("Andrews")).setResultsName("lname")
another_mname = pp.OneOrMore(pp.Word("Miller")).setResultsName("mname", listAllMatches=True)

full = fname + mname + lname + another_mname

output = full.parseString("Max Joseph Joseph Andrews Miller Miller").asDict()
print(output)
# {'fname': ['Max'], 'mname': [['Joseph', 'Joseph'], ['Miller', 'Miller']], 'lname': ['Andrews']}

or even like this:

import pyparsing as pp

fname = pp.OneOrMore(pp.Word("Max")).setResultsName("fname")
mname = pp.OneOrMore(pp.Word("Joseph") | pp.Word("Miller")).setResultsName("mname", listAllMatches=True)
lname = pp.OneOrMore(pp.Word("Andrews")).setResultsName("lname")

full = fname + mname + lname + mname

output = full.parseString("Max Joseph Joseph Andrews Miller Miller").asDict()
print(output)
# {'fname': ['Max'], 'mname': [['Joseph', 'Joseph'], ['Miller', 'Miller']], 'lname': ['Andrews']}

although the result is a list of lists and not a single flattened one.