Search code examples
pythondictionaryparentheses

Extract data from within parenthesis in python


I know there are many questions with the same title. My situation is a little different. I have a string like:

"Cat(Money(8)Points(80)Friends(Online(0)Offline(8)Total(8)))Mouse(Money(10)Points(10000)Friends(Online(10)Offline(80)Total(90)))"

(Notice that there are parenthesis nested inside another)

and I need to parse it into nested dictionaries like for example:

d["Cat"]["Money"] == 8
d["Cat"]["Points"] = 80
d["Mouse"]["Friends"]["Online"] == 10

and so on. I would like to do this without libraries and regex. If you choose to use these, please explain the code in great detail. Thanks in advance!

Edit:

Although this code will not make any sense, this is what I have so far:

o_str = "Jake(Money(8)Points(80)Friends(Online(0)Offline(8)Total(8)))Mouse(Money(10)Points(10000)Friends(Online(10)Offline(80)Total(90)))"
spl = o_str.split("(")
def reverseIndex(str1, str2):
    try:
        return len(str1) - str1.rindex(str2)
    except Exception:
        return len(str1)
def app(arr,end):
    new_arr = []
    for i in range(0,len(arr)):
        if i < len(arr)-1:
            new_arr.append(arr[i]+end)
        else:
            new_arr.append(arr[i])
    return new_arr

spl = app(spl,"(")
ends = []
end_words = []
op = 0
cl = 0
for i in range(0,len(spl)):
    print i
    cl += spl[i].count(")")
    op += 1
    if cl == op-1:
        ends.append(i)
        end_words.append(spl[i])
        #break
    print op
    print cl
    print
print end_words

The end words are the sections at the beginning of each statement. I plan on using recursive to do the rest.


Solution

  • Now that was interesting. You really nerd-sniped me on this one...

    def parse(tokens):
        """ take iterator of tokens, parse to dictionary or atom """
        dictionary = {}
        # iterate tokens...
        for token in tokens:
            if token == ")" or next(tokens) == ")":
                # token is ')' -> end of dict; next is ')' -> 'leaf'
                break
            # add sub-parse to dictionary
            dictionary[token] = parse(tokens)
        # return dict, if non-empty, else token
        return dictionary or int(token)
    

    Setup and demo:

    >>> s = "Cat(Money(8)Points(80)Friends(Online(0)Offline(8)Total(8)))Mouse(Money(10)Points(10000)Friends(Online(10)Offline(80)Total(90)))"
    >>> tokens = iter(s.replace("(", " ( ").replace(")", " ) ").split())
    >>> pprint(parse(tokens))
    {'Cat': {'Friends': {'Offline': 8, 'Online': 0, 'Total': 8},
             'Money': 8,
             'Points': 80},
     'Mouse': {'Friends': {'Offline': 80, 'Online': 10, 'Total': 90},
               'Money': 10,
               'Points': 10000}}
    

    Alternatively, you could also use a series of string replacements to turn that string into an actual Python dictionary string and then evaluate that, e.g. like this:

    as_dict = eval("{'" + s.replace(")", "'}, ")
                           .replace("(", "': {'")
                           .replace(", ", ", '")
                           .replace(", ''", "")[:-3] + "}")
    

    This will wrap the 'leafs' in singleton sets of strings, e.g. {'8'} instead of 8, but this should be easy to fix in a post-processing step.