I am trying to use rowSum in sparklyr to create an indicator variable where all the variables are missing but it seems that rowSum doesn't work in sparklyr.
I have to write the name of all the variables in is.na() function like below which is impossible since I have 100 variables.
y <- c(NA,1,2)
x <- c(NA,NA,3)
z <- c(NA,NA,NA)
dt = data.frame(x,y,z)
sdf_copy_to(sc, dt)
dt %>%
mutate(new = ifelse(is.na(x) & is.na(y) & is.na(z), 1,0))
Is there anyway to write multiple variables in is.na() function?
library(rlang)
library(glue)
create a string with all the variable names of interest.
I am calling all of them for simplicity; use regex (e.g., grep
) otherwise
cols_of_interest <- names(dt)
test_string <- glue("ifelse({glue('is.na({cols_of_interest})') %>%
glue_collapse(sep = '&')}, yes = 1, no = 0)")
parse the string with rlang
dt %>% mutate(flag = !!rlang::parse_expr(test_string))