Creating columns with hierarchal relationship to each other from two original columns

I am working on this project and I need to create columns with hierarchal relationship to each other in pandas based on the original two columns that indicates hierarchal relationships of each value.

	supervisory_org	Superior_org
0	org_2	org_1
1	org_7	org_3
2	org_4	org_2
3	org_6	org_3
4	org_9	org_5
5	org_3	org_1
6	org_5	org_3
7	org_8	org_5

Above are the two original columns that indicates the relationship between two organizations(values). I want to make those hierarchal relationship between orgs more visible by spreading across multiple columns as below. (Below is the desired output)

	Level_1	Level_2	Level_3	Level_4
0	org_1	org_2	org_4	NaN
1	org_1	org_3	org_6	NaN
2	org_1	org_3	org_7	NaN
3	org_1	org_3	org_5	org_8
4	org_1	org_3	org_5	org_9

I'm trying to make this work in pandas but still haven't come up with a way to do it. Anyone have any suggestions on how to approach this problems?

This is the code for creating the first table

df = pd.DataFrame({"supervisory_org" : ["org_2","org_7" ,"org_4", "org_6","org_9","org_3","org_5", "org_8"],
                   "Superior_org" : ["org_1","org_3","org_2","org_3","org_5","org_1","org_3","org_5"]})

Solution

Find the roots (those with no org over them)
Use merge iteratively to match superior -> zero or more next rank orgs

The usual merge behavior with matching one to many ensures we duplicate rows for superior orgs as appropriate when needed.

We rename and remove columns as needed in the loop to handle the iterated merge.

import pandas as pd

df = pd.DataFrame({"supervisory_org" : ["org_2","org_7" ,"org_4", "org_6","org_9","org_3","org_5", "org_8"],
                   "Superior_org" : ["org_1","org_3","org_2","org_3","org_5","org_1","org_3","org_5"]})

# Find the roots - orgs without superiors
roots = (pd.Index(df['Superior_org'].unique()).difference(df['supervisory_org'])
         .to_series()
         .rename("Level_1")
        )
roots

org_1    org_1
Name: Level_1, dtype: object

# Use merge to find those ranked just below their superiors

result = roots
upper_col = "Superior_org"
lower_col = 'supervisory_org'
for index in range(1, 100):
    result = pd.merge(result, df,
                      left_on=f"Level_{index}",
                      right_on=upper_col,
                      how="left")
    result.pop(upper_col)
    # if there are no more lower levels, done
    if result[lower_col].isna().all():
        result.pop(lower_col)
        break
    result = result.rename(columns={lower_col: f'Level_{index + 1}'})
    
result

  Level_1 Level_2 Level_3 Level_4
0   org_1   org_2   org_4     NaN
1   org_1   org_3   org_7     NaN
2   org_1   org_3   org_6     NaN
3   org_1   org_3   org_5   org_9
4   org_1   org_3   org_5   org_8

Sorting

The sorting rule apparent from your example is first by depth, then alphabethical.

So we can compute the depth, then sort, and remove the depth column.

result.insert(0, 'depth', result.notna().sum(axis=1))

result = (result.sort_values(by=list(result.columns), ignore_index=True)
          .drop(columns='depth')
         )
result

  Level_1 Level_2 Level_3 Level_4
0   org_1   org_2   org_4     NaN
1   org_1   org_3   org_6     NaN
2   org_1   org_3   org_7     NaN
3   org_1   org_3   org_5   org_8
4   org_1   org_3   org_5   org_9