Search code examples
rggplot2geom-bar

geom_bar(): plotting the frequency of a subgroup out of total observations


I'm relatively new to R and would like to ask: I have a data frame (my.data) with 2 columns: "PHENO" which is a factor with two levels (1 or 2) and "bins" which is numeric (natural number between 1 to 10). I'm trying to plot the frequency (as percents) of PHENO==2 vs the bins, where the 100% is total number observations (levels 1+2).

This is what I did, but the 100% is not all of the observations:

ggplot(data = subset(my.data, PHENO == 2)) + 
  geom_bar(mapping = aes(x = as.factor(bins), y = ..prop.., group = 1), stat = "count") +
  scale_y_continuous(labels = scales::percent_format(), limits = c(0,0.15)) +
  geom_hline(yintercept = 0.05, linetype="dashed", color = 'blue', size = 1) + 
  annotate(geom = "text", label = 'Prevalence 5%', x = 1.5, y = 0.05, vjust = -1, col = 'blue') +

Also, I tried to add frequency labels over the bars but it didn't work:

geom_text(aes(label = as.factor(bins)), position=position_dodge(width=0.9), vjust = -0.25)

I would appreciate your help.


Solution

  • Is this what you need?

    df %>% 
      group_by(PHENO, bins) %>% 
      count(PHENO) %>% 
      ungroup() %>% 
      mutate(Percent=n/sum(n)*100) %>% 
      filter(PHENO=="2") %>% #select PHENO 2 here in order to keep 100% of all observations
      ggplot(aes(y=Percent, x=bins))+
      geom_col()+
      geom_hline(yintercept = 5, linetype="dashed", color = 'blue', size = 1)+
      geom_text(aes(label = as.factor(bins)), position=position_dodge(width=0.9), vjust = -0.25)
    

    I used this mock data, which might not correspond to yours of course, for illustration purposes:

    df <- structure(list(PHENO = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 
    2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L), .Label = c("1", 
    "2"), class = "factor"), bins = c(1, 2, 4, 5, 7, 8, 9, 5, 2, 
    3, 6, 9, 10, 5, 6, 6, 6, 4)), class = "data.frame", row.names = c(NA, 
    -18L))
    

    Result:

    bar_plot