Search code examples
rdplyrreshapehierarchical-datatidy

Create Parent/Child ID's for Hierarchical Data in R


I have some hierarchical data that I am trying to get into a more relational style table. I want to mutate a column to my dataframe that tells me the parent to each row. Here, Lights is a child to Electrical, and Wiring and Bulbs are both children to Lights, and so on. Is there a way I can say "if Level = 2, find the nearest Level 1 above it in the dataframe, and assign that as its parent?" Another example would be: if the row is a Level 3 item (Wiring), its parent would be the nearest Level 2 above it in the dataframe (Lights).

I would need the nearest Level above it since the data is stored hierarchically.

Level   Group    Name     Price
1        01    Electrical  200
2        01    Lights      82
3        01    Wiring      60
3        01    Bulbs       22
2        01    Vent        118
3        01    Detector    60
3        01    Cable       20
3        01    Plastic     25
3        01    Brackets    13

1        02    Interior    500
2        02    Doors       300
3        02    Knobs       40
2        02    Trim        200
3        02    Shelves     150
4        02    Brackets    50 

Desired Output would be something like:

Level   Group    Name     Price  ID Parent
1        01    Electrical  200   1   None 
2        01    Lights      82    2    1
3        01    Wiring      60    3    2
3        01    Bulbs       22    4    2
2        01    Vent        118   5    1
3        01    Detector    60    6    5
3        01    Cable       20    7    5
3        01    Plastic     25    8    5
3        01    Brackets    13    9    5

1        02    Interior    500   10  None
2        02    Doors       300   11   10
3        02    Knobs       40    12   11
2        02    Trim        200   13   10
3        02    Shelves     150   14   13
4        02    Brackets    50    15   14 

Solution

  • Here is one way to do this but I think there should be other better/efficient solution to this.

    library(dplyr)
    library(purrr)
    
    df %>%
      mutate(ID = row_number()) %>%
      group_by(Group) %>%
      mutate(Parent = map2_dbl(Level, ID, ~{
        tmp <- ID[Level == .x - 1]
        val <- tmp - .y
        if(any(val < 0)) max(tmp[val < 0]) else NA
        })) %>%
      ungroup
    
    #   Level Group Name       Price    ID Parent
    #   <int> <int> <chr>      <int> <int>  <dbl>
    # 1     1     1 Electrical   200     1     NA
    # 2     2     1 Lights        82     2      1
    # 3     3     1 Wiring        60     3      2
    # 4     3     1 Bulbs         22     4      2
    # 5     2     1 Vent         118     5      1
    # 6     3     1 Detector      60     6      5
    # 7     3     1 Cable         20     7      5
    # 8     3     1 Plastic       25     8      5
    # 9     3     1 Brackets      13     9      5
    #10     1     2 Interior     500    10     NA
    #11     2     2 Doors        300    11     10
    #12     3     2 Knobs         40    12     11
    #13     2     2 Trim         200    13     10
    #14     3     2 Shelves      150    14     13
    #15     4     2 Brackets      50    15     14
    

    We create and ID column which is row number in the dataframe and for each Level in the group we find out the nearest ID value which is one level above the current level.