Search code examples
dplyr

In dplyr, how do I write a case_when formula that looks at a column containing variable names and then assesses the value of those variables?


I have a dataframe in which one column, last_question, contains a list of variable names which are the last questions that various participants viewed on a survey.

id <- c(1, 2, 3, 4, 5)
q1 <- c('NA', 'Italy', 'Peru', 'China', 'Vermont')
q2 <-c('d',NA,'y',NA,'we')
q3 <- c(NA,'m',NA,NA,'op')
q4 <- c(123,NA,NA,NA,029)
last_question <-c("q1","q4","q2","q3","q4")

test <- data.frame(id, q1,q2, q3, q4, last_question)

  id        q1   q2   q3 q4 last_question
1  1 Australia <NA> <NA> NA            q1
2  2     Italy <NA>    m NA            q4
3  3      Peru    y <NA> NA            q2
4  4     China <NA> <NA> NA            q4
5  5   Vermont   we   op 29            q4

I want to write a case_when formula that assigns the label "yes" to anyone for whom the answer to their last question is not null. Here is my expected output:

  id        q1   q2   q3 q4 last_question completion
1  1 Australia <NA> <NA> NA            q1        yes
2  2     Italy <NA>    m NA            q4         no
3  3      Peru    y <NA> NA            q2        yes
4  4     China <NA>  wsp NA            q4         no
5  5   Vermont   we   op 29            q4        yes

I can't quite figure out how to tell the case_when formula to look at the contents of last_question and see variable names and then go assess those variables. I tried

test <-test%>%
mutate(completion=case_when(!is.na(get(last_question))~"yes",TRUE~"No"))

and I got this error:

Error in mutate(): ℹ In argument: completion = case_when(...). Caused by error in case_when(): ! Failed to evaluate the left-hand side of formula 4. Caused by error in get(): ! first argument has length > 1

Thank you in advance for your assistance!


Solution

  • I would probably do something like this!

    library(tidyverse)
    
    
    test |>
      mutate(q4 = as.character(q4)) |>
      pivot_longer(cols = c(q1:q4),
                   names_to = "question",
                   values_to = "answer") |>
      mutate(completed = ifelse(is.na(answer), "No", "Yes"), .by = last_question) |>
      pivot_wider(names_from = question,
                  values_from = answer )
    #> # A tibble: 9 × 7
    #>      id last_question completed q1      q2    q3    q4   
    #>   <dbl> <chr>         <chr>     <chr>   <chr> <chr> <chr>
    #> 1     1 q1            Yes       NA      d     <NA>  123  
    #> 2     1 q1            No        <NA>    <NA>  <NA>  <NA> 
    #> 3     2 q4            Yes       Italy   <NA>  m     <NA> 
    #> 4     2 q4            No        <NA>    <NA>  <NA>  <NA> 
    #> 5     3 q2            Yes       Peru    y     <NA>  <NA> 
    #> 6     3 q2            No        <NA>    <NA>  <NA>  <NA> 
    #> 7     4 q3            Yes       China   <NA>  <NA>  <NA> 
    #> 8     4 q3            No        <NA>    <NA>  <NA>  <NA> 
    #> 9     5 q4            Yes       Vermont we    op    29
    

    Created on 2024-06-11 with reprex v2.1.0