Search code examples
r

count the number of yes in long data


I have a data frame in a long data form. The value of variable pbp_b10b_bendesc_lim_al could be 2 (No), 1 (Yes) or NA. I want to count the percentage of pbp_b10b_bendesc_lim_al==1 in each year out of the total number of rows. However, if I use mean(), it doesn't count the NAs and I'm getting 100% in some years.

vbid_nodup_merged_filtered %>%
  select(Year, VBID, pbp_b10b_bendesc_lim_al) %>%
  arrange(Year) %>%
  add_count(Year) %>%
  group_by(Year) %>%
  mutate(pbp_b10b_bendesc_lim_al = ifelse(pbp_b10b_bendesc_lim_al == 1, 1, 0)) %>%
  mutate(percent_yes = 100*mean(pbp_b10b_bendesc_lim_al, na.rm = TRUE)) %>%
  slice(1)-> vbid_unlimited_percent

Solution

  • You can replace NA with another value, say 3, and then summarise:

    vbid_nodup_merged_filtered |>
      mutate(pbp_b10b_bendesc_lim_al=replace_na(pbp_b10b_bendesc_lim_al, value=3)) |>
      summarise(percent_yes=100*mean(pbp_b10b_bendesc_lim_al==1, na.rm=TRUE), .by=Year) |>
      arrange(Year)
    

    Note: no data supplied. Another option is to use mean_(.) from the hablar package.