Search code examples
rconditional-statementsuser-defined-functions

R 4.1.2: Dynamically check values for a cumulative pattern. Null following values if that pattern occurs at any time across values


This relates to another problem I posted, but I did not quite ask the right question. If anyone can help with this, it would really be appreciated.

I have a DF with several players' answers to 100 questions in a quiz (example data frame below with 10 questions and 10 players-not the real data, which is not really from a quiz, but the principle is the same).

My goal is to create a function that will check when a player has answered 3 questions incorrectly cumulatively at any point during their answers, and then change their following answers to the string "disc". I would like to be able to change the parameters also, so it could be 4 or 5 questions incorrect etc. In the df: 1=correct, 0=incorrect, and 2=unanswered. Unanswered is considered incorrect, but I do not want to recode it as 0.

df=data.frame(playerID=numeric(),
              q1=numeric(),
              q2=numeric(),
              q3=numeric(),
              q4=numeric(),
              q5=numeric(),
              q6=numeric(),
              q7=numeric(),
              q8=numeric(),
              q9=numeric(),
              q10=numeric())

set.seed(1)
for(i in 1:10){
  list_i=c(i,sample(0:2,1),sample(0:2,1),sample(0:2,1),sample(0:2,1),sample(0:2,1),sample(0:2,1),sample(0:2,1),sample(0:2,1),sample(0:2,1),sample(0:2,1))
  df[i,]=list_i
}

So, in this DF, for example, playerID=3,8 and 9 should have their answers="disc" from q4 onwards, whereas playerid5 should have “disc” from 8 onwards. So anytime there are 3 consecutive incorrect answers (including values of 2), the following answers should change to “disc”.

I presume the syntax would be a for loop with an if statement inside using mutate or similar.


Solution

  • One possible solution using mutate and across:

    df %>%
      ungroup() %>%
      mutate(
        # Mutate across all question columns
        across(
          starts_with("q"),
          function(col) {
            # Get previous columns
            col_i <- which(names(cur_data())==cur_column())
            previous_cols <- 2:(col_i-1)
            
            # Get results for previous questions as string (i.e. zero, or 2)
            previous_qs <- select(cur_data(), all_of(previous_cols)) %>%
              mutate(across(everything(), ~as.numeric(.x %in% c(0,2)))) %>%
              tidyr::unite("str", sep = "") %>%
              pull(str)
    
            # Check for three successive incorrect answers at some previous point
            results <- grepl(pattern = "111", previous_qs)
            
            # For those with three successive incorrect answers at some previous point, overwrite value with 'disc'
            col[results] <- "disc"
            col
          }
        )
      )