I have a dataset with this shape
# TEST
dt <- data.table("Level" = c( 2, 3, 5,5,6,6,4,4,3,5,6,6,6,5,6,7),
"Code.1"=c("01","0101","0101 21 00","0101 29","0101 29 10","0101 29 90","0101 30 00","0101 90 00",
"0102","0102 21","0102 21 20","0102 21 30","0102 21 90","0102 29","0102 29 05","0102 29 10"),
"Description"=c("CHAPTER 1 - LIVE ANIMALS",
"Live horses, asses, mules and hinnies",
"-- Pure-bred breeding animals",
"-- Other",
"--- For slaughter",
"--- Other" ,
"- Asses",
"- Other" ,
"Live bovine animals",
"-- Pure-bred breeding animals",
"--- Heifers (female bovines that have never calved)",
"--- Cows",
"--- Other",
"-- Other",
"--- Of the sub-genus Bibos or of the sub-genus Poephagus",
"---- Of a weight not exceeding 80Â kg"
)
)
I would like to generate a dataset where the upper level description will be repeated, from level 3 onwards. The level can be identified either by the dimension Level or by the number of "-" at the beginning of the description character chain. The aim is to get then the following data table, maybe with a seprator such as ";" to identify the label of each level
dt <- data.table("Level" = c(2,3,5,5,6),
"Code.1"=c("01","0101","0101 21 00","0101 29","0101 29 10"),
"Description"=c("CHAPTER 1 - LIVE ANIMALS","Live horses, asses, mules and hinnies","-- Pure-bred breeding animals","-- Other","--- For slaughter"),
"LongDescription" = c("CHAPTER 1 - LIVE ANIMALS","Live horses, asses, mules and hinnies",
"Live horses, asses, mules and hinnies ; Pure-bred breeding animals",
"Live horses, asses, mules and hinnies ; Other",
"Live horses, asses, mules and hinnies ; Other ; For slaughter"))
Be aware that there could be many level 5 for example under level 3; therefore the solution of taking information from previous row will not work.
I changed the dt data to understand better what is the output needed. In this example, the row with Code1 = 0101 30 00, we should get the LongDescription = "Live horses, asses, mules and hinnies;- Asses". So it is not really based on leve, but on level inside the hierarchy of the Code1. dimension.
Thanks for your feedback. the option below should work. I split the data into seperate columns. filled the columns using last value carried forward, if the next row is higher. In the end I united the columns to get the desired result
library(tidyr)
## Create seperate columns
dt2 = dt %>% mutate(lvl3 = ifelse(Level == 3, Description, NA)) %>%
mutate(lvl4 = ifelse(Level == 4, Description, NA)) %>%
mutate(lvl5 = ifelse(Level == 5, Description, NA)) %>%
mutate(lvl6 = ifelse(Level == 6, Description, NA))
## fill next row dependent on the lvl
for (row in 1:nrow(dt2)) {
print(row)
if (dt2$Level[row] > 3) {
dt2$lvl3[row] = dt2$lvl3[row-1]
}
if (dt2$Level[row] > 4) {
dt2$lvl4[row] = dt2$lvl4[row-1]
}
if (dt2$Level[row] > 5) {
dt2$lvl5[row] = dt2$lvl5[row-1]
}
if (dt2$Level[row] > 6) {
dt2$lvl6[row] = dt2$lvl6[row-1]
}
}
## Combine columns
dt3 = dt2 %>%
unite(., col = "Description2",c('lvl3','lvl4','lvl5','lvl6'), na.rm=TRUE, sep = " ; ") %>%
mutate(Long_Description = ifelse(Level<3, Description, Description2)) %>%
select(-Description2)