Search code examples
pythonpandasdataframedictionarystreamlit

How to create nested dict with parent children hierachy for streamlit_tree_select from dataframe?


To use streamlit_tree_select I need to convert a dataframe to its expected structure.

I guess to achieve the goal I could use pandas.groupby('parkey') to group the children, but I'm not sure how to apply this to the appropriate parents while iterating the groups.

The dataframe holding categories:

import pandas as pd

data = [
  {"idnr": 1,"parkey": 0,"descr": "A","info":"string"},
  {"idnr": 2,"parkey": 0,"descr": "B","info":"string"},
  {"idnr": 3,"parkey": 2,"descr": "B B 1","info":"string"},
  {"idnr": 4,"parkey": 3,"descr":"B B B 1","info":"string"},
  {"idnr": 5,"parkey": 3,"descr":"B B B 2","info":"string"}
]

The expected output:

output = [
  {"idnr": 1,"parkey": 0,"descr": "A","info":"string"},
  {"idnr": 2,"parkey": 0,"descr": "B","info":"string","children":[
         {"idnr": 3,"parkey": 2,"descr": "B B 1","info":"string","children":[
            {"idnr": 4,"parkey": 3,"descr":"B B B 1","info":"string"},
            {"idnr": 5,"parkey": 3,"descr":"B B B 2","info":"string"}
         ]}
      ]
  }
]

Solution

  • One way to do this is to pre-process the data, forming a dict with the children of each of the parents. You can then process the 0 property of this dict, recursively adding children from the dict to the appropriate children array:

    def add_child(tree, child):
        key = child['parkey']
        tree[key] = tree.get(key, []) + [child]
    
    parents = dict()
    for child in data:
        add_child(parents, child)
    

    Output:

    {
     0: [
      {'idnr': 1, 'parkey': 0, 'descr': 'A', 'info': 'string'},
      {'idnr': 2, 'parkey': 0, 'descr': 'B', 'info': 'string'}
     ],
     2: [
      {'idnr': 3, 'parkey': 2, 'descr': 'B B 1', 'info': 'string'}
     ],
     3: [
      {'idnr': 4, 'parkey': 3, 'descr': 'B B B 1', 'info': 'string'},
      {'idnr': 5, 'parkey': 3, 'descr': 'B B B 2', 'info': 'string'}
     ]
    }
    

    Now you can iterate the entries in parents[0], recursively adding children as you go:

    def add_children(tree, parents):
        for child in tree:
            # any children
            idnr = child['idnr']
            if idnr in parents:
                # add the children
                child['children'] = parents[idnr]
                add_children(child['children'], parents)
    
    output = parents[0]
    add_children(output, parents)
    

    Output:

    [
      {'idnr': 1, 'parkey': 0, 'descr': 'A', 'info': 'string'},
      {'idnr': 2, 'parkey': 0, 'descr': 'B', 'info': 'string', 'children': [
          {'idnr': 3, 'parkey': 2, 'descr': 'B B 1', 'info': 'string', 'children': [
              {'idnr': 4, 'parkey': 3, 'descr': 'B B B 1', 'info': 'string'},
              {'idnr': 5, 'parkey': 3, 'descr': 'B B B 2', 'info': 'string'}
            ]
          }
        ]
      }
    ]
    

    Notes:

    1. The add_children routine modifies the data list as it relies on references to work. If you don't want to that, make a copy of data first or change the add_child code to make copies when assigning child values.
    2. You could combine add_child and add_children, however by splitting the task it means that data does not have to be sorted by parkey.