Search code examples
rfor-loopdplyriterationapply

Change assignement in column based on occurence in rowa of the same value in other columns


I have this dataset:

structure(list(ID = c(1, 2, 3, 4, 6, 7), V = c(0, 0, 1, 1, 
1, 0), Mus = c(1, 0, 1, 1, 1, 0), R = c(1, 0, 1, 1, 1, 1), 
    E = c(1, 0, 0, 1, 0, 0), S = c(1, 0, 1, 1, 1, 0), t = c(0, 
    0, 0, 1, 0, 0), score = c(1, 0.4, 1, 0.4, 0.4, 0.4)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"), na.action = structure(c(`5` = 5L, 
`12` = 12L, `15` = 15L, `21` = 21L, `22` = 22L, `23` = 23L, `34` = 34L, 
`44` = 44L, `46` = 46L, `52` = 52L, `56` = 56L, `57` = 57L, `58` = 58L
), class = "omit"))

I would like to make new assignment on the score column, in this way:

  1. in the case of each ID, if there is an occurrence of number 1 higher than 3, then in the last column should appear number 1.

  2. in the case of each ID, if there is an occurrence of the number 1 equal to 3, then the last column should appear number 0.4.

  3. in the case of each ID, if there is an occurrence of number 1 lower than 3, then the last column should appear number 0.

Could please suggest a way to do this via for loop, dplyr, map, or apply functions?

Thanks


Solution

  • This should work - calculating the number of 1s in the new ones column then applying the conditions using case_when:

    library(tidyverse)
    
    
    df |> 
      rowwise() |> 
      mutate(ones = sum(c_across(V:t)),
             score = case_when(
               ones  > 3 ~ 1,
               ones == 3 ~ 0.4,
               ones < 3 ~ 0
             ))
    #> # A tibble: 6 × 9
    #> # Rowwise: 
    #>      ID     V   Mus     R     E     S     t score  ones
    #>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    #> 1     1     0     1     1     1     1     0     1     4
    #> 2     2     0     0     0     0     0     0     0     0
    #> 3     3     1     1     1     0     1     0     1     4
    #> 4     4     1     1     1     1     1     1     1     6
    #> 5     6     1     1     1     0     1     0     1     4
    #> 6     7     0     0     1     0     0     0     0     1
    

    To make it tidier, you can use sum(c_across(V:t)) directly in case_when to not need a new variable (though it would repeat the calculation each time):

    df |> 
      rowwise() |> 
      mutate(score = case_when(
               sum(c_across(V:t))  > 3 ~ 1,
               sum(c_across(V:t)) == 3 ~ 0.4,
               sum(c_across(V:t)) < 3 ~ 0
             ))