Need help about Pyparsing about += symbol

We have been using pyparsing for a generic config file parser for some time now. The inner blocks of the config parser look something like this:

 {
 key1 = [ value1.1, value1.2, value1.3 ];
 key2 = [ value2.1, value2.2, value2.3 ];
 }

Using dictOf and delimitedList, we end up with the equivalent of a dictionary mapping keys (key1 and key2) to the corresponding list of value tokens.

Recently, I was hoping to extend the parser to support:

 {
 key1 = [ value1.1, value1.2, value1.3 ];
 key1 += [ value1.4, value1.5 ];
 key2 = [ value2.1, value2.2, value2.3 ];
 }

In this example, I want the resulting dict to map key1 to [ value1.1, value1.2, value1.3, value1.4, value1.5 ]. Looking at the pyparsing options available, I didn't see any clear way to do this. A Google search didn't appear to turn up anything either. (Although it is possible I didn't know which search words to use for this.)

Is there some hook for this that I am missing? Is there some post processing combine functionality I should be doing? Can anyone suggest what the best "pyparsing way" of approaching this would be?

Thanks

Solution

Pyparsing blows up at this because you are essentially throwing a different grammar at it.

There is nothing in pyparsing out-of-the-box that will handle this, so you will need to roll your own special version of Dict that will take key_expr '=' value_expr ';' lines and comprehend that key_expr '+=' value_expr ';' are intended to modify a previously defined key. In pyparsing, you would do this with a parse action that is attached to the overall ZeroOrMore expression, which may contain definitions and updates.

import pyparsing as pp

LBRACE,RBRACE,LBRACK,RBRACK,SEMI = map(pp.Suppress, "{}[];")
key_expr = pp.Word(pp.alphas, pp.alphanums)
value_atom = pp.Word(pp.alphas, pp.alphanums + '._')
value_list = LBRACK + pp.delimitedList(value_atom) + RBRACK

key_defn = pp.Group(key_expr("key") + '=' + value_list("value") + SEMI)
key_update = pp.Group(key_expr("key") + '+=' + value_list("value") + SEMI)

# using the trailing '*' will support saving multiple expressions under the same results name
# in this case, it will sort out the "x = []" definitions vs "x += []" updates
key_values = pp.ZeroOrMore(key_defn("defns*") | key_update("updates*"))

# parse action to build a dict beginning with all definitions, and then
# adding updates
def assemble_dict(t):
    ret = {kv.key: kv.value.asList() for kv in t.defns}
    for kv in t.updates:
        ret[kv.key] += kv.value.asList()
    return ret
key_values.addParseAction(assemble_dict)

kv_expr = LBRACE + key_values("vars") + RBRACE

test = """
 {
 key1 = [ value1.1, value1.2, value1.3 ];
 key1 += [ value1.4, value1.5 ];
 key2 = [ value2.1, value2.2, value2.3 ];
 }
"""

print(kv_expr.parseString(test).dump())

prints:

[{'key1': ['value1.1', 'value1.2', 'value1.3', 'value1.4', 'value1.5'], 'key2': ['value2.1', 'value2.2', 'value2.3']}]
- vars: {'key1': ['value1.1', 'value1.2', 'value1.3', 'value1.4', 'value1.5'], 'key2': ['value2.1', 'value2.2', 'value2.3']}

If you find that you later want to add support for things like "key4 = key2 + key3" or "key4 += key2", you will revisit the expressions used to parse a key-value pair, and then extend the key_values expression and the assemble_dict parse action accordingly.