I have some hierarchical data that I am trying to get into a more relational style table. I want to mutate a column to my dataframe that tells me the parent to each row. Here, Lights is a child to Electrical, and Wiring and Bulbs are both children to Lights, and so on. Is there a way I can say "if Level = 2, find the nearest Level 1 above it in the dataframe, and assign that as its parent?" Another example would be: if the row is a Level 3 item (Wiring), its parent would be the nearest Level 2 above it in the dataframe (Lights).
I would need the nearest Level above it since the data is stored hierarchically.
Level Group Name Price
1 01 Electrical 200
2 01 Lights 82
3 01 Wiring 60
3 01 Bulbs 22
2 01 Vent 118
3 01 Detector 60
3 01 Cable 20
3 01 Plastic 25
3 01 Brackets 13
1 02 Interior 500
2 02 Doors 300
3 02 Knobs 40
2 02 Trim 200
3 02 Shelves 150
4 02 Brackets 50
Desired Output would be something like:
Level Group Name Price ID Parent
1 01 Electrical 200 1 None
2 01 Lights 82 2 1
3 01 Wiring 60 3 2
3 01 Bulbs 22 4 2
2 01 Vent 118 5 1
3 01 Detector 60 6 5
3 01 Cable 20 7 5
3 01 Plastic 25 8 5
3 01 Brackets 13 9 5
1 02 Interior 500 10 None
2 02 Doors 300 11 10
3 02 Knobs 40 12 11
2 02 Trim 200 13 10
3 02 Shelves 150 14 13
4 02 Brackets 50 15 14
Here is one way to do this but I think there should be other better/efficient solution to this.
library(dplyr)
library(purrr)
df %>%
mutate(ID = row_number()) %>%
group_by(Group) %>%
mutate(Parent = map2_dbl(Level, ID, ~{
tmp <- ID[Level == .x - 1]
val <- tmp - .y
if(any(val < 0)) max(tmp[val < 0]) else NA
})) %>%
ungroup
# Level Group Name Price ID Parent
# <int> <int> <chr> <int> <int> <dbl>
# 1 1 1 Electrical 200 1 NA
# 2 2 1 Lights 82 2 1
# 3 3 1 Wiring 60 3 2
# 4 3 1 Bulbs 22 4 2
# 5 2 1 Vent 118 5 1
# 6 3 1 Detector 60 6 5
# 7 3 1 Cable 20 7 5
# 8 3 1 Plastic 25 8 5
# 9 3 1 Brackets 13 9 5
#10 1 2 Interior 500 10 NA
#11 2 2 Doors 300 11 10
#12 3 2 Knobs 40 12 11
#13 2 2 Trim 200 13 10
#14 3 2 Shelves 150 14 13
#15 4 2 Brackets 50 15 14
We create and ID
column which is row number in the dataframe and for each Level
in the group we find out the nearest ID
value which is one level above the current level.