Search code examples
rfor-loopsplitsubset

struggling with for loop in R


I am currently trying to subset my dataset representing the employees of a firm, according to the "time_passed" in the firm, into categories (people that have passed 0 to 5 years, others that have passed 6 to 10, others 11 to 15 etc: by 4 each time). I imagine it is possible to do it without a for-loop but I would like to be able to do it with both a for-loop and the split (or subset, or any other R function) function.

Here is the structure of my dataset :

 structure(list(sex = c("F", "H", "F", "F", "H", "F"), age = c("24", 
 "33", "53", "32", "38", "21"), time_passed = c("0", "3", "4", 
 "0", "2", "0"), level = c("N7  ", "N7  ", "N9  ", "N7  ", "N8  ", 
 "    "), wage = c("2605", "4931", "11123", "3750", "6180", "858.31"
 )), row.names = c(NA, 6L), class = "data.frame")

And the for-loop I have tried, unsuccessfully :

 list_tranches <- c()

for (i in seq(from = 5, to = 40, by=5)) {
  for (j in 1:nrow(data_2021)){
    if(data_2021[j,4] %in% seq(i-5+1:i))
    tranche_i <- data_2021[j,]
    list_tranches <- c(list_tranches, tranche_i)
  }
}

Ultimately, I want to have a variable "tranche" added to my dataset df, indicating for each individual in what category of time_passed in the firm he is (0 to 5, 6 to 10 years, etc). How could I proceed ?


Solution

  • Are you looking for findInterval or cut followed by split?

    data_2021 <-
      structure(list(
        sex = c("F", "H", "F", "F", "H", "F"), 
        age = c("24", "33", "53", "32", "38", "21"), 
        time_passed = c("0", "3", "4", "0", "2", "0"), 
        level = c("N7  ", "N7  ", "N9  ", "N7  ", "N8  ", "    "), 
        wage = c("2605", "4931", "11123", "3750", "6180", "858.31")), 
        row.names = c(NA, 6L), 
        class = "data.frame")
    
    data_2021$time_passed <- as.integer(data_2021$time_passed)
    
    breaks <- seq(0, 49, by = 5)
    ff <- findInterval(data_2021$time_passed, breaks)
    split(data_2021, ff)
    #> $`1`
    #>   sex age time_passed level   wage
    #> 1   F  24           0  N7     2605
    #> 2   H  33           3  N7     4931
    #> 3   F  53           4  N9    11123
    #> 4   F  32           0  N7     3750
    #> 5   H  38           2  N8     6180
    #> 6   F  21           0       858.31
    
    cc <- cut(data_2021$time_passed, breaks = breaks, include.lowest = TRUE)
    cc <- droplevels(cc)
    split(data_2021, cc)
    #> $`[0,5]`
    #>   sex age time_passed level   wage
    #> 1   F  24           0  N7     2605
    #> 2   H  33           3  N7     4931
    #> 3   F  53           4  N9    11123
    #> 4   F  32           0  N7     3750
    #> 5   H  38           2  N8     6180
    #> 6   F  21           0       858.31
    

    Created on 2022-08-04 by the reprex package (v2.0.1)


    To add a new column tranche, use cut/split and the result's names attribute.

    cc <- cut(data_2021$time_passed, breaks = breaks, include.lowest = TRUE)
    cc <- droplevels(cc)
    sp <- split(data_2021, cc)
    res <- lapply(seq_along(sp), \(i){
      sp[[i]]$tranche <- names(sp)[i]
      sp[[i]]
    })
    rm(sp)
    res <- do.call(rbind, res)
    res
    #>   sex age time_passed level   wage tranche
    #> 1   F  24           0  N7     2605   [0,5]
    #> 2   H  33           3  N7     4931   [0,5]
    #> 3   F  53           4  N9    11123   [0,5]
    #> 4   F  32           0  N7     3750   [0,5]
    #> 5   H  38           2  N8     6180   [0,5]
    #> 6   F  21           0       858.31   [0,5]
    

    Created on 2022-08-04 by the reprex package (v2.0.1)