struggling with for loop in R

I am currently trying to subset my dataset representing the employees of a firm, according to the "time_passed" in the firm, into categories (people that have passed 0 to 5 years, others that have passed 6 to 10, others 11 to 15 etc: by 4 each time). I imagine it is possible to do it without a for-loop but I would like to be able to do it with both a for-loop and the split (or subset, or any other R function) function.

Here is the structure of my dataset :

 structure(list(sex = c("F", "H", "F", "F", "H", "F"), age = c("24", 
 "33", "53", "32", "38", "21"), time_passed = c("0", "3", "4", 
 "0", "2", "0"), level = c("N7  ", "N7  ", "N9  ", "N7  ", "N8  ", 
 "    "), wage = c("2605", "4931", "11123", "3750", "6180", "858.31"
 )), row.names = c(NA, 6L), class = "data.frame")

And the for-loop I have tried, unsuccessfully :

 list_tranches <- c()

for (i in seq(from = 5, to = 40, by=5)) {
  for (j in 1:nrow(data_2021)){
    if(data_2021[j,4] %in% seq(i-5+1:i))
    tranche_i <- data_2021[j,]
    list_tranches <- c(list_tranches, tranche_i)
  }
}

Ultimately, I want to have a variable "tranche" added to my dataset df, indicating for each individual in what category of time_passed in the firm he is (0 to 5, 6 to 10 years, etc). How could I proceed ?

Solution

Are you looking for findInterval or cut followed by split?

data_2021 <-
  structure(list(
    sex = c("F", "H", "F", "F", "H", "F"), 
    age = c("24", "33", "53", "32", "38", "21"), 
    time_passed = c("0", "3", "4", "0", "2", "0"), 
    level = c("N7  ", "N7  ", "N9  ", "N7  ", "N8  ", "    "), 
    wage = c("2605", "4931", "11123", "3750", "6180", "858.31")), 
    row.names = c(NA, 6L), 
    class = "data.frame")

data_2021$time_passed <- as.integer(data_2021$time_passed)

breaks <- seq(0, 49, by = 5)
ff <- findInterval(data_2021$time_passed, breaks)
split(data_2021, ff)
#> $`1`
#>   sex age time_passed level   wage
#> 1   F  24           0  N7     2605
#> 2   H  33           3  N7     4931
#> 3   F  53           4  N9    11123
#> 4   F  32           0  N7     3750
#> 5   H  38           2  N8     6180
#> 6   F  21           0       858.31

cc <- cut(data_2021$time_passed, breaks = breaks, include.lowest = TRUE)
cc <- droplevels(cc)
split(data_2021, cc)
#> $`[0,5]`
#>   sex age time_passed level   wage
#> 1   F  24           0  N7     2605
#> 2   H  33           3  N7     4931
#> 3   F  53           4  N9    11123
#> 4   F  32           0  N7     3750
#> 5   H  38           2  N8     6180
#> 6   F  21           0       858.31

^{Created on 2022-08-04 by the reprex package (v2.0.1)}

To add a new column tranche, use cut/split and the result's names attribute.

cc <- cut(data_2021$time_passed, breaks = breaks, include.lowest = TRUE)
cc <- droplevels(cc)
sp <- split(data_2021, cc)
res <- lapply(seq_along(sp), \(i){
  sp[[i]]$tranche <- names(sp)[i]
  sp[[i]]
})
rm(sp)
res <- do.call(rbind, res)
res
#>   sex age time_passed level   wage tranche
#> 1   F  24           0  N7     2605   [0,5]
#> 2   H  33           3  N7     4931   [0,5]
#> 3   F  53           4  N9    11123   [0,5]
#> 4   F  32           0  N7     3750   [0,5]
#> 5   H  38           2  N8     6180   [0,5]
#> 6   F  21           0       858.31   [0,5]

^{Created on 2022-08-04 by the reprex package (v2.0.1)}