We have been using pyparsing for a generic config file parser for some time now. The inner blocks of the config parser look something like this:
{
key1 = [ value1.1, value1.2, value1.3 ];
key2 = [ value2.1, value2.2, value2.3 ];
}
Using dictOf and delimitedList, we end up with the equivalent of a dictionary mapping keys (key1 and key2) to the corresponding list of value tokens.
Recently, I was hoping to extend the parser to support:
{
key1 = [ value1.1, value1.2, value1.3 ];
key1 += [ value1.4, value1.5 ];
key2 = [ value2.1, value2.2, value2.3 ];
}
In this example, I want the resulting dict to map key1 to [ value1.1, value1.2, value1.3, value1.4, value1.5 ]. Looking at the pyparsing options available, I didn't see any clear way to do this. A Google search didn't appear to turn up anything either. (Although it is possible I didn't know which search words to use for this.)
Is there some hook for this that I am missing? Is there some post processing combine functionality I should be doing? Can anyone suggest what the best "pyparsing way" of approaching this would be?
Thanks
Pyparsing blows up at this because you are essentially throwing a different grammar at it.
There is nothing in pyparsing out-of-the-box that will handle this, so you will need to roll your own special version of Dict that will take key_expr '=' value_expr ';'
lines and comprehend that key_expr '+=' value_expr ';'
are intended to modify a previously defined key. In pyparsing, you would do this with a parse action that is attached to the overall ZeroOrMore expression, which may contain definitions and updates.
import pyparsing as pp
LBRACE,RBRACE,LBRACK,RBRACK,SEMI = map(pp.Suppress, "{}[];")
key_expr = pp.Word(pp.alphas, pp.alphanums)
value_atom = pp.Word(pp.alphas, pp.alphanums + '._')
value_list = LBRACK + pp.delimitedList(value_atom) + RBRACK
key_defn = pp.Group(key_expr("key") + '=' + value_list("value") + SEMI)
key_update = pp.Group(key_expr("key") + '+=' + value_list("value") + SEMI)
# using the trailing '*' will support saving multiple expressions under the same results name
# in this case, it will sort out the "x = []" definitions vs "x += []" updates
key_values = pp.ZeroOrMore(key_defn("defns*") | key_update("updates*"))
# parse action to build a dict beginning with all definitions, and then
# adding updates
def assemble_dict(t):
ret = {kv.key: kv.value.asList() for kv in t.defns}
for kv in t.updates:
ret[kv.key] += kv.value.asList()
return ret
key_values.addParseAction(assemble_dict)
kv_expr = LBRACE + key_values("vars") + RBRACE
test = """
{
key1 = [ value1.1, value1.2, value1.3 ];
key1 += [ value1.4, value1.5 ];
key2 = [ value2.1, value2.2, value2.3 ];
}
"""
print(kv_expr.parseString(test).dump())
prints:
[{'key1': ['value1.1', 'value1.2', 'value1.3', 'value1.4', 'value1.5'], 'key2': ['value2.1', 'value2.2', 'value2.3']}]
- vars: {'key1': ['value1.1', 'value1.2', 'value1.3', 'value1.4', 'value1.5'], 'key2': ['value2.1', 'value2.2', 'value2.3']}
If you find that you later want to add support for things like "key4 = key2 + key3" or "key4 += key2", you will revisit the expressions used to parse a key-value
pair, and then extend the key_values
expression and the assemble_dict
parse action accordingly.