Search code examples
pythonpandashierarchical-data

How to create a parent-child dataframe from an existing pandas dataframe of hierarchical structure and arbitrary shape?


How do you convert a given dataframe with a hierarchical structure and arbitrary shape (say, similar to the one below) into a new dataframe with a parent and child column?

Edit: Note that a constraint is that a child cannot be its own parent.

data = {'level1': ['A', 'A', 'B', 'B', 'C'],
        'level2': ['James', 'Robert', 'Patricia', 'Patricia', 'John'],
        'level3': ['Stockholm', 'Denver', 'Moscow', 'Moscow', 'Palermo'],
        'level4': ['red', 'Denver', 'yellow', 'purple', 'blue']
        }

df = pd.DataFrame(data)

  level1    level2     level3  level4
0      A     James  Stockholm     red
1      A    Robert     Denver  Denver
2      B  Patricia     Moscow  yellow
3      B  Patricia     Moscow  purple
4      C      John    Palermo    blue

Desired output is something like this:

       parent      child
0           A      James
1           A     Robert
2           B   Patricia
3           C       John
4       James  Stockholm
5      Robert     Denver
6    Patricia     Moscow
7        John    Palermo
8   Stockholm        red
9      Moscow     yellow
10     Moscow     purple
11    Palermo       blue

Solution

  • What I can think of is using a for loop over the columns of the dataframe:

    columns = df.columns
    length = len(columns)
    parent = []
    children = []
    for i in range(length):
      if i != length - 1 :
        parent += df[columns[i]].to_list()
        children += df[columns[i+1]].to_list()
    newDf = pd.DataFrame({"parent":parent, "children":children}).drop_duplicates()
    newDf[newDf["parent"] != newDf["children"]]
    

    Output

           parent   children
    0           A      James
    1           A     Robert
    2           B   Patricia
    4           C       John
    5       James  Stockholm
    6      Robert     Denver
    7    Patricia     Moscow
    9        John    Palermo
    10  Stockholm        red
    12     Moscow     yellow
    13     Moscow     purple
    14    Palermo       blue