I have a data frame in a long data form. The value of variable pbp_b10b_bendesc_lim_al
could be 2 (No), 1 (Yes) or NA
. I want to count the percentage of pbp_b10b_bendesc_lim_al
==1 in each year out of the total number of rows. However, if I use mean()
, it doesn't count the NA
s and I'm getting 100% in some years.
vbid_nodup_merged_filtered %>%
select(Year, VBID, pbp_b10b_bendesc_lim_al) %>%
arrange(Year) %>%
add_count(Year) %>%
group_by(Year) %>%
mutate(pbp_b10b_bendesc_lim_al = ifelse(pbp_b10b_bendesc_lim_al == 1, 1, 0)) %>%
mutate(percent_yes = 100*mean(pbp_b10b_bendesc_lim_al, na.rm = TRUE)) %>%
slice(1)-> vbid_unlimited_percent
You can replace NA with another value, say 3, and then summarise:
vbid_nodup_merged_filtered |>
mutate(pbp_b10b_bendesc_lim_al=replace_na(pbp_b10b_bendesc_lim_al, value=3)) |>
summarise(percent_yes=100*mean(pbp_b10b_bendesc_lim_al==1, na.rm=TRUE), .by=Year) |>
arrange(Year)
Note: no data supplied. Another option is to use mean_(.)
from the hablar package.