Search code examples
pythonjsonparsingdictionarybunch

Read Bunch() from string


I have the following string in a report file:

"Bunch(conditions=['s1', 's2', 's3', 's4', 's5', 's6'], durations=[[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]], onsets=[[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])"

I would like to turn it into a Bunch() object or a dict, so that I can access the information inside (via either my_var.conditions or my_var["conditions"]).

This works very well with eval():

eval("Bunch(conditions=['s1', 's2', 's3', 's4', 's5', 's6'], durations=[[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]], onsets=[[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])")

however I would like to avoid using that.

I have tried to write a couple of string substitutions so that I convert it to a dict syntax and then parse it with json.loads() but that looks very very hackish, and will break as soon as I encounter any new fields in future strings; e.g.:

"{"+"Bunch(conditions=['s1', 's2', 's3', 's4', 's5', 's6'], durations=[[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]], onsets=[[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])"[1:-1]+"}".replace("conditions=","'conditions':")

You get the idea.

Do you know if there is any better way to parse this?


Solution

  • This pyparsing code will define a parsing expression for your Bunch declaration.

    from pyparsing import (pyparsing_common, Suppress, Keyword, Forward, quotedString, 
        Group, delimitedList, Dict, removeQuotes, ParseResults)
    
    # define pyparsing parser for the Bunch declaration
    LBRACK,RBRACK,LPAR,RPAR,EQ = map(Suppress, "[]()=")
    integer = pyparsing_common.integer
    real = pyparsing_common.real
    ident = pyparsing_common.identifier
    
    # define a recursive expression for nested lists
    listExpr = Forward()
    listItem = real | integer | quotedString.setParseAction(removeQuotes) | Group(listExpr)
    listExpr << LBRACK + delimitedList(listItem) + RBRACK
    
    # define an expression for the Bunch declaration
    BUNCH = Keyword("Bunch")
    arg_defn = Group(ident + EQ + listItem)
    bunch_decl = BUNCH + LPAR + Dict(delimitedList(arg_defn))("args") + RPAR
    

    Here is that parser run against your sample input:

    # run the sample input as a test
    sample = """Bunch(conditions=['s1', 's2', 's3', 's4', 's5', 's6'],
                      durations=[[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]],
                      onsets=[[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])"""
    bb = bunch_decl.parseString(sample)
    # print the parsed output as-is
    print(bb)
    

    Gives:

    ['Bunch', [['conditions', ['s1', 's2', 's3', 's4', 's5', 's6']], 
        ['durations', [[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]]], 
        ['onsets', [[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]]]]]
    

    With pyparsing, you can also add a parse-time callback, so that pyparsing will do the tokens->Bunch conversion for you:

    # define a simple placeholder class for Bunch
    class Bunch(object):
        def __init__(self, **kwargs):
            self.__dict__.update(kwargs)
        def __repr__(self):
            return "Bunch:(%s)" % ', '.join("%r: %s" % item for item in vars(self).items())
    
    # add this as a parse action, and pyparsing will autoconvert the parsed data to a Bunch
    bunch_decl.addParseAction(lambda t: Bunch(**t.args.asDict()))
    

    Now the parser will give you an actual Bunch instance:

    [Bunch:('durations': [[30.0], [30.0], [30.0], [30.0], [30.0], [30.0]], 
            'conditions': ['s1', 's2', 's3', 's4', 's5', 's6'], 
            'onsets': [[172.77], [322.77], [472.77], [622.77], [772.77], [922.77]])]