I have data such as this:
data_in <- read_table2("Id Q62_1 Q62_2 Q3_1 Q3_2 Q3_3 Q3_4 Q3_5
1 Yes Sometimes
2 Always
3
4 No Always Yes
5
6 Always No Likely Yes Always Always
7 Yes Sometimes Maybe Unlikely Sometimes Sometimes
8 Always Yes Likely No Always Always
9 Sometimes Unlikely Sometimes Sometimes
10 No No Likely Maybe
11 Sometimes Maybe Unlikely Sometimes Sometimes
12 Always Yes Likely Always Always
")
I would like to calculate the number of missing response within columns that start with Q62 and then from columns Q3_1 to Q3_5 separately.
I know that rowSums is handy to sum numeric variables, but is there a dplyr/piped equivalent to sum na's?
For example, if this were numeric data and I wanted to sum the q62 series, I could use the following:
data_in %>%
mutate(Q62_NA = rowSums(select(.,"Q62_1", "Q62_2"))
But how do I sum NAs?
My output should look something like this:
data_out <- read_table2("Id Q62_1 Q62_2 Q3_1 Q3_2 Q3_3 Q3_4 Q3_5 Q62_NA Q3_NA
1 Yes Sometimes 0 5
2 Always 1 5
3 2 5
4 No Always Yes 0 5
5 2 5
6 Always No Likely Yes Always Always 1
7 Yes Sometimes Maybe Unlikely Sometimes Sometimes 0 1
8 Always Yes Likely No Always Always 1 0
9 Sometimes Unlikely Sometimes Sometimes 1 1
10 No No Likely Maybe 1 2
11 Sometimes Maybe Unlikely Sometimes Sometimes 1 1
12 Always Yes Likely Always Always 1 1
")
Thank you!!
We can wrap the select
with is.na
to convert it to a logical matrix
and then do the rowSums
on that matrix to sum the number of TRUE elements per row
library(dplyr)
data_in %>%
mutate(Q62_NA = rowSums(is.na(select(.,"Q62_1", "Q62_2"))))
Or an option with c_across
and rowwise
data_in %>%
rowwise %>%
mutate(Q62_NA = sum(is.na(c_across(starts_with('Q6')))))