Search code examples
pythonpyparsing

Is there a way to get nested dictionary’s in pyparsing result?


I have the code here:

#parser.py
import pyparsing as pp

class parser:
    def __init__(self):
        self.integer = pp.Word(pp.nums).set_results_name('int')
        self.string1 = pp.QuotedString(quoteChar='"')
        self.string2 = pp.QuotedString(quoteChar="'")
        self.string = pp.Or([self.string1, self.string2]).set_results_name('str')
        self.object = pp.Or([self.string, self.integer])
        self.tuple = '(' + pp.delimited_list(self.object, delim=',') + ')'
        self.tuple = self.tuple.set_results_name('tuple')
        self.object = pp.Or([self.string, self.integer, self.tuple])

        self.varname = pp.Word(pp.alphas + "_").set_results_name('varname')
        self.let_ = pp.Keyword('let')
        self.const_ = pp.Keyword('const')
        self.var_ = pp.Keyword('var')
        self.set_ = pp.one_of(": =")
        self.variable = pp.Or([pp.Or([self.let_, self.const_, self.var_]) + self.varname + self.set_ + self.object,
                               self.varname + self.set_ + self.object])

    def parseVar(self, string):
        return self.variable.parse_string(string)
#main.py

from parser import parser
parse = parser()
print(parse.parseVar('hi = ("hi", 2)').as_dict())

And I get:

{"varname":"hi', 'str': 'hi',int:"2', "tuple': ['(", 'hi', '2', ')']}

(sorry for the “ and ‘ swapping - [EDIT]fixed these for you) But what I want to get is this:

{"varname": "hi", "tuple": {"str":"hi", "int":"2"}}

Is there anyway I could get this result?


Solution

  • You are really very very close with this. The only thing you need to do is to suppress the opening and closing parentheses from your parsed results.

    This is pretty common with punctuation in parsing. The punctuation characters are super important during the parsing process, but post-parsing, they just get in the way. For your parser, I defined tuple as this:

            LPAR = pp.Suppress("(")
            RPAR = pp.Suppress(")")
            self.tuple = pp.Group(LPAR + pp.delimited_list(self.object, delim=',') + RPAR)
    

    after which I get the output that you said that you wanted.

    I'm also curious, as to why you use the Or([expr1, expr2, expr3]) style, as opposed to expr1 | expr2 | expr3 or expr1 ^ expr2 ^ expr3 if you truly need the more expensive match-longest behavior of pyparsing's Or. To work on your code, the first thing I did to make it easier for me to follow was to convert all those overt constructions to ones using pyparsing's overloaded operators:

        def __init__(self):
            self.integer = pp.Word(pp.nums).set_results_name('int')
            self.string1 = pp.QuotedString(quoteChar='"')
            self.string2 = pp.QuotedString(quoteChar="'")
            self.string = (self.string1 | self.string2).set_results_name('str')
            self.object = self.string | self.integer
            LPAR = pp.Suppress("(")
            RPAR = pp.Suppress(")")
            self.tuple = pp.Group(LPAR + pp.delimited_list(self.object, delim=',') + RPAR)
            self.tuple = self.tuple.set_results_name('tuple')
            self.object = self.string | self.integer | self.tuple
    
            self.varname = pp.Word(pp.alphas + "_").set_results_name('varname')
            self.let_ = pp.Keyword('let')
            self.const_ = pp.Keyword('const')
            self.var_ = pp.Keyword('var')
            self.set_ = pp.one_of(": =")
            self.variable = pp.Optional(self.let_ | self.const_ | self.var_) + self.varname + self.set_ + self.object
    

    And in truth, the only one of these that really needs to be attached to self is self.variable. All the rest can be written just a local variables (though you will probably want to change those like object and tuple which clash with Python builtins).

        def __init__(self):
            integer = pp.Word(pp.nums).set_results_name('int')
            string1 = pp.QuotedString(quoteChar='"')
            string2 = pp.QuotedString(quoteChar="'")
            string = (string1 | string2).set_results_name('str')
            object = string | integer
            LPAR = pp.Suppress("(")
            RPAR = pp.Suppress(")")
            tuple = pp.Group(LPAR + pp.delimited_list(object, delim=',') + RPAR)
            tuple = tuple.set_results_name('tuple')
            object = string | integer | tuple
    
            varname = pp.Word(pp.alphas + "_").set_results_name('varname')
            let_ = pp.Keyword('let')
            const_ = pp.Keyword('const')
            var_ = pp.Keyword('var')
            set_ = pp.one_of(": =")
            self.variable = pp.Optional(let_ | const_ | var_) + varname + set_ + object