Search code examples
pythonjsongraphdecision-tree

Parse string of paths into JSON object


I have a bunch of strings (1) that represent a graph/decision tree (2). Each character represents a path along the graph.

###1###
'0A1B'
'0A1A'
'0A0A'
'0A0A'
'0B10'
'0B11'

enter image description here

I want to process this list of strings in Python to create a JSON variable with the following structure (3):

###3###
{
"name": "0",
"count": 6,
"children": [
    {
    "name": "A",
    "count": 4,
    "children": [
        {"name": "0",
        "count": 2,
        "children": [
            {"name": "A", "count": 2}
        ]
        },
        {"name": "1",
        "count": 2,
        "children": [
            {"name": "A", "count": 1},
            {"name": "B", "count": 1}
        ]
        }
    ]
    },
    {
    "name": "B",
    "count": 2,
    "children": [
        {"name": "1",
        "count": 2,
        "children": [
            {"name": "0", "count": 1},
            {"name": "1", "count": 1}
        ]
        }
    ]
    }
]
}

Is there a library that can make this easier? I'm able to use the json library to create json objects, but not sure how to parse the string. It seems like a recursive function is necessary?


Solution

  • I don't think you need any specific libraries, other than the built-in json:

    import json
    
    
    def dict_tree(ss):
        # there has to be at least one string and it has to have at least 1 character
        assert ss and ss[0]
        result = {'name': ss[0][0], 'count': 0}
        for s in ss:
            # all strings should start with the same character 
            # (the suggested data structure does not support more than one name at the root level)
            assert s and s[0] == result['name']
            p = result
            p['count'] += 1
            for ch in s[1:]:
                if 'children' not in p:
                    p['children'] = []
                for child in p['children']:
                    if child['name'] == ch:
                        p = child
                        break
                else:
                    p['children'].append({'name': ch, 'count': 0})
                    p = p['children'][-1]
                p['count'] += 1
        return result
    
    
    def main():
        strings = [
            '0A1B',
            '0A1A',
            '0A0A',
            '0A0A',
            '0B10',
            '0B11'
        ]
        print(json.dumps(dict_tree(strings), indent=4))
    
    
    main()