Search code examples
pythondictionarypyparsing

Pyparsing - Trouble parsing file to dictionary structure


I am attempting to use Pyparsing to parse Aspartix(.apx) format files (http://www.dbai.tuwien.ac.at/research/project/argumentation/systempage/docu.htm) and I am having trouble structuring my results to a dictionary.

I have specified the grammar as follows:

from pyparsing import *

ID = Word(alphanums)
arg_pair = Group(ID + Suppress(',') + ID)
value = Word(nums)
lineEnd = Suppress(').')
arg = Suppress('arg(') + ID + lineEnd
attack = Suppress('att(') + arg_pair + lineEnd
pref = Suppress('pref(') + arg_pair + lineEnd
val = Suppress('val(') + ID + Suppress(',') + value + lineEnd
valpref = Suppress('valpref(') + value + Suppress(',') + value +  lineEnd
support = Suppress('support(') + arg_pair + lineEnd

apx = OneOrMore(arg.setName('arg') | attack.setName('att') | pref.setName('pref') | val.setName('val') | valpref.setName('valpref') | support.setName('support'))

I am unsure of how to use the setName() function to define dictionary keys so that each occurrence of the arg, attack etc rules maps to the defined key. Using the above code yields no usable dictionary keys.

For example:

"""arg(a).
arg(b).
att(a,b)."""

Would map to:

{"arg": ["a","b"], "att":[["a","b"]]}

I would appreciate any help you can give.


Solution

  • A few other comments on your parser:

    • Generally, you should avoid defining literals that combine keywords with related punctuation. For instance, defining arg as Suppress('arg(') will look specifically for "arg(", failing if there is any whitespace between the keyword and the opening parenthesis. Instead, I recommend defining your keywords using the Keyword class. You can suppress these if you like, but Keyword will enforce complete matching of the word, and protect against accidentally matching the leading 'val' of 'valpref'.

    • Defining ID as Word(alphanums) will open the door for confusion between ID and integer values. I expect identifiers will always at least start with an alphabetic character, so you can use the 2-argument form of Word to specify alphas only as the set of allowed leading characters, and the alphanums as the set of allowed body characters.

    • See my comment on your post re: setName() vs setResultsName()

    I retooled your parser slightly so that all commands have the same keys: "cmd" and "args". This allows you to write polymorphic code, like the for-loop at the end of this sample.

    from pyparsing import *
    LPAR,RPAR,DOT,COMMA = map(Suppress,"().,")
    arg,attack,pref,val,valpref,support = map(Keyword, 
        "arg att pref val valpref support".split())
    
    ID = Word(alphas, alphanums)
    id_pair = Group(ID + COMMA + ID)
    integer = Word(nums)
    int_pair = Group(integer + COMMA + integer)
    
    arg_cmd = Group(arg("cmd") + LPAR + ID("args") + RPAR)
    attack_cmd = Group(attack("cmd") + LPAR + id_pair("args") + RPAR)
    pref_cmd = Group(pref("cmd") + LPAR + id_pair("args") + RPAR)
    val_cmd = Group(val("cmd") + LPAR + Group(ID + COMMA + integer)("args") + RPAR)
    valpref_cmd = Group(valpref("cmd") + LPAR + int_pair("args") + RPAR)
    support_cmd = Group(support("cmd") + LPAR + id_pair("args") + RPAR)
    
    apx = OneOrMore((arg_cmd | attack_cmd | pref_cmd | val_cmd | valpref_cmd | support_cmd) + DOT)
    
    for command in apx.parseString(apxSource):
        print command.dump()
        print command.cmd
        print command.args
    

    If you want to follow your original plan of naming, I think it will look something like this.

    arg_cmd = Group(arg + LPAR + ID("arg") + RPAR)
    attack_cmd = Group(attack + LPAR + id_pair("attack") + RPAR)
    pref_cmd = Group(pref + LPAR + id_pair("pref") + RPAR)
    val_cmd = Group(val + LPAR + Group(ID + COMMA + integer)("val") + RPAR)
    valpref_cmd = Group(valpref + LPAR + int_pair("valpref") + RPAR)
    support_cmd = Group(support + LPAR + id_pair("support") + RPAR)
    

    Or this.

    arg_cmd = (arg + LPAR + ID("arg*") + RPAR)
    attack_cmd = (attack + LPAR + id_pair("attack*") + RPAR)
    pref_cmd = (pref + LPAR + id_pair("pref*") + RPAR)
    val_cmd = (val + LPAR + Group(ID + COMMA + integer)("val*") + RPAR)
    valpref_cmd = (valpref + LPAR + int_pair("valpref*") + RPAR)
    support_cmd = (support + LPAR + id_pair("support*") + RPAR)
    

    As you can see, there are many approaches to constructing these parsers and the resulting parsed structures, revolving around personal style as much as right vs. wrong. In the last two examples, there are no "cmd" or "args" names defined, so you'll have to remove them from the sample for-loop code above. If you are looking for dict-key like parsing, I think the last structure will be most like what you are looking for. However, note that this parser will discard the order that the commands are found. If order is significant, you should probably use one of the first two samples, since the Group classes will keep the command ordering intact.