Search code examples
pythonpython-3.xpyparsing

asDict method in pyparsing overrides previous key


I have a contrived example of a problem that I am facing:

import pyparsing as pp

fname = pp.OneOrMore( pp.Word("Max") ).setResultsName("fname")
mname = pp.OneOrMore(s pp.Word("Joseph") ).setResultsName("mname")
lname = pp.OneOrMore( pp.Word("Andrews") ).setResultsName("lname")
another_mname = pp.OneOrMore(pp.Word("Miller")).setResultsName("mname")

full = fname + mname + lname + another_mname

output = full.parseString("Max Joseph Joseph Andrews Miller Miller").asDict()

# current output 
{'fname': ['Max'], 'lname': ['Andrews'], 'mname': ['Miller', 'Miller']}

It is obvious why the output is the way it is. However, I would like also to collect 'Joseph' as another value. e.g.

# desired output
{'fname': ['Max'], 'lname': ['Andrews'], 'mname': ['Joseph', 'Joseph', 'Miller', 'Miller']}

Thanks.


Solution

  • Your code does not work because you set the name of the results to be same. This cause the "mname" entry associated to mname in the resulting dict() to be replaced by the "mname" entry associated to another_mname.

    One way workaround to this would be to collect the names into two separate results and join those afterwards:

    import pyparsing as pp
    
    fname = pp.OneOrMore(pp.Word("Max"))("fname")
    mname = pp.OneOrMore(pp.Word("Joseph"))("mname")
    lname = pp.OneOrMore(pp.Word("Andrews"))("lname")
    another_mname = pp.OneOrMore(pp.Word("Miller"))("mname2")
    
    full = fname + mname + lname + another_mname
    
    output = full.parseString("Max Joseph Joseph Andrews Miller Miller").asDict()
    print(output)
    # {'fname': ['Max'], 'mname': ['Joseph', 'Joseph'], 'lname': ['Andrews'], 'mname2': ['Miller', 'Miller']}
    
    # clean-up dict
    output['mname'] = output['mname'] + output['mname2']
    del output['mname2']
    
    print(output)
    # {'fname': ['Max'], 'mname': ['Joseph', 'Joseph', 'Miller', 'Miller'], 'lname': ['Andrews']}
    

    Note that you cannot simply define mname to be:

    mname = pp.OneOrMore(pp.Word("Joseph") | pp.Word("Miller"))("mname")
    

    This would lead to a similar issue:

    import pyparsing as pp
    
    fname = pp.OneOrMore(pp.Word("Max"))("fname")
    mname = pp.OneOrMore(pp.Word("Joseph") | pp.Word("Miller"))("mname")
    lname = pp.OneOrMore(pp.Word("Andrews"))("lname")
    
    full = fname + mname + lname + mname
    
    output = full.parseString("Max Joseph Joseph Andrews Miller Miller").asDict()
    print(output)
    # {'fname': ['Max'], 'mname': ['Miller', 'Miller'], 'lname': ['Andrews']}
    

    but for a different reason: now the mname at the end of full is replacing the previous value of mname.


    One could also automatize this, e.g.

    import pyparsing as pp
    
    fname = pp.OneOrMore(pp.Word("Max"))("fname")
    mname = pp.OneOrMore(pp.Word("Joseph"))("mname:0")
    lname = pp.OneOrMore(pp.Word("Andrews"))("lname")
    another_mname = pp.OneOrMore(pp.Word("Miller"))("mname:1")
    
    full = fname + mname + lname + another_mname
    
    output = full.parseString("Max Max Joseph Joseph Andrews Miller Miller").asDict()
    print(output)
    # {'fname': ['Max', 'Max'], 'mname:0': ['Joseph', 'Joseph'], 'lname': ['Andrews'], 'mname:1': ['Miller', 'Miller']}
    
    
    def quench(pp_dict, mapping=lambda k: k.split(':')[0]):
        result = {}
        to_remove = []
        for k, v in pp_dict.items():
            new_k = mapping(k)
            if k != new_k:
                if new_k not in result:
                    result[new_k] = []
                result[new_k].extend(v)
            else:
                result[k] = v
        return result
    
    
    print(quench(output))
    # {'fname': ['Max', 'Max'], 'mname': ['Joseph', 'Joseph', 'Miller', 'Miller'], 'lname': ['Andrews']}
    

    Or, even more mindlessly by preprocessing full by automatically converting multiple "mname" instances to numbered ones (e.g. "mname:0") to be quenched later.


    EDIT

    (as pointed out by @PaulMcG)

    This mechanism is implemented in pyparsing directly:

    import pyparsing as pp
    
    fname = pp.OneOrMore(pp.Word("Max")).setResultsName("fname")
    mname = pp.OneOrMore(pp.Word("Joseph")).setResultsName("mname", listAllMatches=True)
    lname = pp.OneOrMore(pp.Word("Andrews")).setResultsName("lname")
    another_mname = pp.OneOrMore(pp.Word("Miller")).setResultsName("mname", listAllMatches=True)
    
    full = fname + mname + lname + another_mname
    
    output = full.parseString("Max Joseph Joseph Andrews Miller Miller").asDict()
    print(output)
    # {'fname': ['Max'], 'mname': [['Joseph', 'Joseph'], ['Miller', 'Miller']], 'lname': ['Andrews']}
    

    or even like this:

    import pyparsing as pp
    
    fname = pp.OneOrMore(pp.Word("Max")).setResultsName("fname")
    mname = pp.OneOrMore(pp.Word("Joseph") | pp.Word("Miller")).setResultsName("mname", listAllMatches=True)
    lname = pp.OneOrMore(pp.Word("Andrews")).setResultsName("lname")
    
    full = fname + mname + lname + mname
    
    output = full.parseString("Max Joseph Joseph Andrews Miller Miller").asDict()
    print(output)
    # {'fname': ['Max'], 'mname': [['Joseph', 'Joseph'], ['Miller', 'Miller']], 'lname': ['Andrews']}
    

    although the result is a list of lists and not a single flattened one.