I am trying to to group_by program names and count the total column amounts per program. There are NA's in the mix.
Taking a data frame like this: dataframe
and getting something like in return returned data
The following code only counts all non NA observations. It doesn't actually add up the numbers. Do I need to do some sort of ifelse here? I also wonder if the !is.na causes it to count all non NA observations, however, if I remove it, I get all NA's as my totals.
df %>%
group_by(ProgramName) %>%
summarise(ED = sum(!is.na(HighSchool)), EMP = sum(!is.na(Employment)))
alternatively, is there is a way to group by program name and count the observations ONLY if they had a 1 in either column, not to tally up the total? That is closer to what I want anyway. Any support would be appreciated.
To answer both questions:
library(dplyr)
df <- structure(list(ProgramName = c("Program A", "Program A", "Program A",
"Program A", "Program B", "Program B", "Program B", "Program B",
"Program C", "Program C", "Program C", "Program C"), HighSchool = c(1L,
0L, 0L, 1L, 1L, 0L, 1L, NA, 1L, 1L, NA, 1L), Employment = c(0L,
0L, 1L, 0L, 1L, 1L, 1L, NA, 0L, 1L, NA, 1L)), class = "data.frame", row.names = c(NA,
-12L))
df %>%
group_by(ProgramName) %>%
summarise(across(HighSchool:Employment, ~ if(all(is.na(.))) NA else sum(., na.rm = TRUE)))
# A tibble: 3 × 3
ProgramName HighSchool Employment
<chr> <int> <int>
1 Program A 2 1
2 Program B 2 3
3 Program C 3 2
# This is the one you state that you actually want
df %>%
group_by(ProgramName) %>%
summarise(ED = sum(HighSchool == 1, na.rm = TRUE),
EMP = sum(Employment == 1, na.rm = TRUE))
A tibble: 3 × 3
ProgramName ED EMP
<chr> <int> <int>
1 Program A 2 1
2 Program B 2 3
3 Program C 3 2