Search code examples
rggplot2bar-chartvisualizationconfidence-interval

plot 95% CI for proportions tables in ggplot2


Concerning error bars, as far as I'm concerned, the most informative one is the 95% CI. That being said, I want to plot it for my proportions table. How do I calculate the 95% correctly for a proportion table? and plot it with ggplot2 ?

  • data:

Data contains the proportions of data (number of schools) collected per regions (A:E)

## calculate proportions:

region <- data %>% count(Q8) %>% 
  mutate(prop = round((prop.table(n) * 100), digits = 2), sd = round(sd(prop.table(n)), 
  digits = 2), Q8 = fct_reorder(Q8, n)) %>% arrange(n) 

## output
> region
  Q8  n  prop   sd
1  E  3 10.34 0.12
2  C  3 10.34 0.12
3  B  4 13.79 0.12
4  A  9 31.03 0.12
5  D 10 34.48 0.12
  • Cool. Now I need to calculate the 95% CI. I've tried:
region_ci  <- data.frame(DescTools::MultinomCI(region$n, conf.level = 0.95)) %>%
               mutate_if(is.numeric, round, 2)
 
> region_ci
   est lwr.ci upr.ci
1 0.10   0.00   0.29
2 0.10   0.00   0.29
3 0.14   0.00   0.32
4 0.31   0.14   0.49
5 0.34   0.17   0.53
  • Now I'd like to plot the proportions with error bars. My attempt:
 region %>% 
  ggplot(aes(y = prop, x = ordered(Q8), fill = Q8)) + 
  geom_bar(stat = "identity", width = 0.3) +
  geom_errorbar(aes(ymin= region_ci$lwr.ci, ymax= region_ci$upr.ci, 
                    width= .1)) + 
  geom_text(aes(label = round(prop, 1.5)),
            nudge_y = 2) + # so the labels don't hit the tops of the bars
  labs(x = "place",
       y = '(%)')
  • which gives me this:

my pic

  • Question: It's clear that I've calculated the CIs wrong. How can I do that properly? and plot the correct error bars ? I've seen some similar posts, such as this one, but I'm still not sure on how to correctly calculate the CIs.

  • I've also tried the approach suggested here, but I've also got weird results. Thanks in adv.

  • data:

> dput(region)
structure(list(Q8 = structure(1:5, .Label = c("E", "C", "B", 
"A", "D"), class = "factor"), n = c(3L, 3L, 4L, 9L, 10L), prop = c(10.34, 
10.34, 13.79, 31.03, 34.48), sd = c(0.12, 0.12, 0.12, 0.12, 0.12
)), row.names = c(NA, -5L), class = "data.frame")]

Solution

  • Your code is correct. The only issue is that the prop values in the region data are the percentages (prop*100) but the CI values in the region_ci not. So in the ggplot multiply lower and upper ci values by 100 too:

     region %>% 
      ggplot(aes(y = prop, x = ordered(Q8), fill = Q8)) + 
      geom_bar(stat = "identity", width = 0.3) +
      geom_errorbar(aes(ymin= region_ci$lwr.ci*100, ymax= region_ci$upr.ci*100, 
                        width= .1)) + 
      geom_text(aes(label = round(prop, 1.5)),
                nudge_y = 2) + # so the labels don't hit the tops of the bars
      labs(x = "place",
           y = '(%)'
    

    enter image description heregraph output