Search code examples
rrowrowsumrowwise

Deal with missing data in row-wise manipulation


ID <- 1:6
math <- c("YES","NO","YES","NO",NA,NA)
history <- c(NA,NA,"NO","NO","YES",NA)

dt <- data.frame(ID, math, history)

  ID math history
1  1  YES    <NA>
2  2   NO    <NA>
3  3  YES      NO
4  4   NO      NO
5  5 <NA>     YES
6  6 <NA>    <NA>

I want to make an additional column ("pass") as follows

  1. If a student ever had "yes" at least once: "YES" (No matter if another subject is missing data or not or so.)

  2. If the student did not get "yes"

    • If both subjects are missing data : NA
    • if one of subjects is "NO" : "NO"

so, the columns would like this: (I can do this manually with this minimal example. but not with my real data)

> dt
  ID math history pass
1  1  YES    <NA>  YES
2  2   NO    <NA>   NO
3  3  YES      NO  YES
4  4   NO      NO   NO
5  5 <NA>     YES  YES
6  6 <NA>    <NA> <NA>

I tried to use

dt$pass <- ifelse(rowSums(dt[,-1]=="YES",na.rm=T)>0,"YES","NO")

this code, but it was tricky because if I put na.rm=TRUE they consider NA is "NO" (ID 6 student will be "NO")

if I put na.rm=FALSE, an only students that have both subject's data are considered.

In my data, I have really lots of columns, not only math and history.


Solution

  • A simple base solution is

    dt$pass <- apply(dt[-1], 1, \(x) sort(x, dec = TRUE)[1])
    
    # > dt
    #   ID math history pass
    # 1  1  YES    <NA>  YES
    # 2  2   NO    <NA>   NO
    # 3  3  YES      NO  YES
    # 4  4   NO      NO   NO
    # 5  5 <NA>     YES  YES
    # 6  6 <NA>    <NA> <NA>
    

    Its dplyr equivalent is

    library(dplyr)
    
    dt %>%
      rowwise() %>%
      mutate(pass = sort(c_across(-1), dec = TRUE)[1]) %>%
      ungroup()