Search code examples
pythonexcelpandastreehierarchy

Transform hierarchy from Excel columns into rows


I have Excel with 3 columns:

OrgunitCode    ParentOrgunitCode     OrgunitName

1500                  Nan                 Head_Org

3200                  1500              2nd_level_Org

3201                  1500              other_2nd_lever_Org

..............................................................

971248197              827484           n_level_Org

103048197              513834           n2_level_Org

I need to transform it to full hierarchy rows (with Excel or Python):

OrgunitName OrgunitCode    OrgunitName(lvl_2) OrgunitCode(lvl_2) .. OrgunitName(lvl_5) OrgunitCode(lvl_5)

Head_Org        1500        2nd_level_Org          3200                  n_level_Org         971248197 

Head_Org        1500        2nd_level_Org          3200                  n2_level_Org         513834

..............................................................

So I thought to use PivotTables, or Lookup in excel, but if I start searching from "Parent", I dont know how many "Childrens", so I confused. And I dont know how to start from lowest point, because how to find them


Solution

  • Ohhh... the joy of working with hierarchies. Always interesting.

    First off, when working with parent-child relationships, I would always recommend to build an organizational node like '/' for level 1; '/1/', '/2/' ... '/n/' for level 2; '/1/1', '/2/1/', '/2/2/'... '/n/n/' for level 3 etc. This will make your work with hierarchies a lot easier (like sorting, finding descendants, siblings and ancestors etc. of the different departments).

    You don't really need an organizational node it in your case, but I have included it for future reference in column D:

    This assumes that row 1 is a header and that the data is organized as:

    A: OrgunitCode 
    B: ParentOrgunitCode 
    C: OrgunitName
    

    Then add these formulas in row 2:

    D: OrgNode (not used but useful in future)
    =IF(COUNTIFS($A:$A,B2)=0,"/"&A2&"/",VLOOKUP(B2,$A:$D,4,FALSE)&$A2&"/")
    
    E: Hierarchy level
    =IF(COUNTIFS($A:$A,$B2)=0,1,VLOOKUP($B2,$A:$E,5,FALSE)+1)
    

    The next is the hierarchy you asked for. Paste this formula into cells E2 and F2 respectively and just drag them to cover all relevant rows and hierarchy levels.

    F: OrgunitName
    =IF($E2=(COLUMN()-4)/2,$C2,IF($E2<(COLUMN()-4)/2,"",VLOOKUP($B2,$A:F,COLUMN(),FALSE)))
    
    G: OrgunitCode
    =IF($E2=(COLUMN()-5)/2,$A2,IF($E2<(COLUMN()-5)/2,"",VLOOKUP($B2,$A:G,COLUMN(),FALSE)))
    

    Note, that the lookup column in G and F are relative to avoid changing the number or adding a number row to reference, when extending the formula to the right. However, if the columns are not split exactly as above, you may have to replace the 'COLUMN()' parts with numbers or number references.

    And one more thing: VLOOKUP works here, but note that it is a heavy operation for Excel, so if you have a very big hierarchy, you may want to consider using the hierarchy function in SQL, recurring functions in MySQL, or similar.

    Formulas dissected

    EDIT: Formula explanation as requested (I think it is a mess, but don't know a better way to do it):

    D: OrgNode (not used but useful in future)
    =IF(COUNTIFS($A:$A,B2)=0,"/"&A2&"/",VLOOKUP(B2,$A:$D,4,FALSE)&$A2&"/")
    
    Builds a hierarchy node with the parent/child relationship seperated by '/':
    'COUNTIFS($A:$A,B2)=0'              <- checks to see if there is a parent. 
    "/"&A2&"/"                          <- ...If no, then this is the/a top node, so take the OrgCode and wrap it in '/'
    VLOOKUP(B2,$A:$D,4,FALSE)&$A2&"/")  <- Else, look up the parent's orgnode and add this department's OrgCode and '/' to the parent's.
    
    
    E: Hierarchy level
    =IF(COUNTIFS($A:$A,$B2)=0,1,VLOOKUP($B2,$A:$E,5,FALSE)+1)
    
    Just determines the hierarchy level of the dept in question.
    =IF(COUNTIFS($A:$A,$B2)=0       <- checks to see if there is a parent.
    1                               <- if not, then make this hierarchy level 1
    VLOOKUP($B2,$A:$E,5,FALSE)+1)   <- Else, lookup the hierarchy level of the parent and add 1
    
    F: OrgunitName
    =IF($E2=(COLUMN()-4)/2,$C2,IF($E2<(COLUMN()-4)/2,"",VLOOKUP($B2,$A:F,COLUMN(),FALSE)))
    
    =IF($E2=(COLUMN()-4)/2,     <- checks to see if the current hierarchy column is equal to the hierarchy level of the current department 
    $C2                         <- if so, return the dept name
    IF($E2<(COLUMN()-4)/2       <- else check if the current hierarchy column is lower than the hierarchy level of the current department
    ""                          <- if so, remain blank
    VLOOKUP($B2,$A:F,COLUMN(),FALSE))) <- else lookup the value in this column of the parent department.
    
    G: OrgunitCode 
    =IF($E2=(COLUMN()-5)/2,$A2,IF($E2<(COLUMN()-5)/2,"",VLOOKUP($B2,$A:G,COLUMN(),FALSE)))
    
    - Same explanation as above, just a different column reference and returns department code instead
    =IF($E2=(COLUMN()-5)/2
    $A2
    IF($E2<(COLUMN()-5)/2
    ""
    VLOOKUP($B2,$A:G,COLUMN(),FALSE)))
    

    And to carve it out:

    (COLUMN()-4)/2  <- Instead of hardcoding the hierarhy levels, this uses the column index to calculate the relevant hierarchy level in the given column. Column F is index 6: (6-4)/2 = 1. Column H is index 8: (8-4)/2 = 2. etc.
    
    COLUMN()        <- In the VLOOKUP reference: We all know VLOOKUP is annoying beacause we have to manually change the number in the col_index_num parameter when dragging across columns. But similar to above, we can use the column index number as a relative reference. Hence her, column F is index 6 and will look up the 6th column. Column G will return the 7th, H the 8th... you get the point.