Search code examples
rif-statementcasena

Using ifelse or case_when on a data frame in R


I am sure the solution to my problem is simple but I am new to coding and cannot seem to find the answer online. I am working on a dataset that is made up of qualitative data that was collected and coded. The dataset includes variables named code 1, code 2, code 3, code 4 and each respondent can have multiple codes and they all have at least one code. I am trying to add a variable that will reflect the number of codes given to a participant. So, participants data looks something like this with the numerical values being codes that we assign given their response:

ID Code1 Code2 Code3 Code4
1.  5      NA    NA    NA 
2.  7       6    4     NA
3.  5      12    NA    NA

The variable I want to include would be the one named count and would look like this:

ID Code1 Code2 Code3 Code4 Count
1.  5      NA    NA    NA   1
2.  7       6    4     NA   3
3.  5      12    NA    NA   2

The first participant would have the number 1 under Count because they only received one code, participant 2 would have a number three under count because they have three codes, and participant 3 would have 2 codes under count because they were only assigned two codes.

Anyway, I have tried using the ifelse function using NA since that signals that fewer codes were assigned but when I try to use it I cannot assign more than 2 outcomes, that is my count variable cannot be more than two different numbers and these can go up to 4. I have also tried using case_when but get an error message saying Error: Case 7 (!is.na(Code1) ~ 1) must be a two-sided formula, not a logical vector.

Here is an example of what I have tried:

df$count = ifelse(is.na(df$Code2),1,2)

df$count = ifelse(is.na(Klara$Code3),2,3)

df$count = ifelse(is.na(Klara$Code4),3,4)

I have also tried:

df <- df %>%
  mutate(count = case_when(!is.na(Code1) ~ 1, 
                                 !is.na(Code2) ~ 2, 
                                 !is.na(Code3) ~ 3,
                                 !is.na(Code4) ~ 4,
                                xor(Code1,Code2)))

So, I cannot figure out what I am doing wrong and how I can get the count variable I need to work. Any suggestions?

Many thanks in advance!!


Solution

  • A dplyr approach using rowSums and across:

    library(dplyr, warn = FALSE)
    
    dat <- dat |>
      mutate(count = rowSums(
        across(starts_with("Code"), ~ !is.na(.x))
      ))
    dat
    #>   ID Code1 Code2 Code3 Code4 count
    #> 1  1     5    NA    NA    NA     1
    #> 2  2     7     6     4    NA     3
    #> 3  3     5    12    NA    NA     2
    

    Or using base R:

    dat$count <- rowSums(
      !is.na(dat[grep("^Code", names(dat), value = TRUE)])
    )
    dat
    #>   ID Code1 Code2 Code3 Code4 count
    #> 1  1     5    NA    NA    NA     1
    #> 2  2     7     6     4    NA     3
    #> 3  3     5    12    NA    NA     2
    

    DATA

    dat <- structure(list(ID = c(1, 2, 3), Code1 = c(5L, 7L, 5L), Code2 = c(
      NA,
      6L, 12L
    ), Code3 = c(NA, 4L, NA), Code4 = c(NA, NA, NA)), class = "data.frame", row.names = c(
      NA,
      -3L
    ))