Search code examples
rnamissing-data

Return NA when all columns are NAs in dplyr


I'm doing some data wrangling on my raw data to get it ready for analysis.

I'm creating a varivale HEART that equals 1 when any of the HEART1, HEART2, HEART3 equals 1; when HEART1, HEART2, HEART3 all equal 0 then HEART equals 0. When all columns are NA, return NA.

ID <- c(1,1,1,2,2,2,3,3,3)
HEART1 <- c(1,0,NA,0,0,0,0,0,NA)
CARDIO <- c(1,0,0,0,0,0,0,1,NA)
ANGINA <- c(1,0,1,0,0,0,0,1,NA)
SLEEP <- c(1,1,1,0,0,0,0,0,0)

df<- data.frame(ID, HEART1, CARDIO, ANGINA)

So the HEART column will be (1,0,1,0,0,0,0,1,NA)

How do I do it with the mutate() function in dplyr? I've heard of the if_all() function but how do I only select HEART1, CARDIO, ANGINA and leave SLEEP out of it?


Solution

  • This will spit out some warnings about coercing the columns to logical, but those can be ignored.

    library(dplyr)
    df |>
      rowwise() |>
      mutate(
        heart = as.integer(any(c_across(starts_with("heart"))))
      )
    # # A tibble: 9 × 5
    # # Rowwise: 
    #      ID HEART1 HEART2 HEART3 heart
    #   <dbl>  <dbl>  <dbl>  <dbl> <int>
    # 1     1      1      1      1     1
    # 2     1      0      0      0     0
    # 3     1     NA      0      1     1
    # 4     2      0      0      0     0
    # 5     2      0      0      0     0
    # 6     2      0      0      0     0
    # 7     3      0      0      0     0
    # 8     3      0      1      1     1
    # 9     3     NA     NA     NA    NA