Search code examples
rggplot2labelpercentagestacked-chart

Add percentage labels to a stacked bar chart above bars


This is really similar to some questions that have been asked before, but more specific. This is a stacked bar chart that I made using ggplot. It shows numbers of positive and negative samples for 10 different antibodies tested in 16 different labs (specifics changed to protect confidentiality). I want to show the percent positive on top of each bar (i.e. outside of the bar area and hovering above the green part). However, for the ones where there is no space there (e.g. "Lab 11") it should be on the inside of the green area of the bar and maybe with white text so it shows up.

enter image description here

Here is the code that I used:

bar <- ggplot(datas, aes(fill=Status, y=Number, x=Antibody)) + 
    geom_bar(position="stack", stat="identity") +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
    panel.spacing.x=unit(0.1, "lines") , panel.spacing.y=unit(0.1,"lines"),
    legend.position ="bottom") +
    facet_wrap(~Lab,nrow=4) + scale_fill_brewer(palette = "Set2")

Let me know if I should post the data too (I don't really know how to do that).

Thanks,

Josh


DATA


Here is a dput of some fake data:

datas = structure(list(Antibody = c("ab_1", "ab_1", "ab_1", "ab_1", "ab_1", 
"ab_1", "ab_2", "ab_2", "ab_2", "ab_2", "ab_2", "ab_2", "ab_3", 
"ab_3", "ab_3", "ab_3", "ab_3", "ab_3"), Lab = c("lab_1", "lab_1", 
"lab_2", "lab_2", "lab_3", "lab_3", "lab_1", "lab_1", "lab_2", 
"lab_2", "lab_3", "lab_3", "lab_1", "lab_1", "lab_2", "lab_2", 
"lab_3", "lab_3"), number_tests = c(1382, 1382, 1951, 1951, 1034, 
1034, 1382, 1382, 1951, 1951, 1034, 1034, 1382, 1382, 1951, 1951, 
1034, 1034), prop_pos = c(0.587053193943575, 0.587053193943575, 
0.587053193943575, 0.587053193943575, 0.587053193943575, 0.587053193943575, 
0.683785125147551, 0.683785125147551, 0.683785125147551, 0.683785125147551, 
0.683785125147551, 0.683785125147551, 0.279249225975946, 0.279249225975946, 
0.279249225975946, 0.279249225975946, 0.279249225975946, 0.279249225975946
), Status = c("npos", "nneg", "npos", "nneg", "npos", "nneg", 
"npos", "nneg", "npos", "nneg", "npos", "nneg", "npos", "nneg", 
"npos", "nneg", "npos", "nneg"), Number = c(799, 583, 1144, 807, 
606, 428, 945, 437, 1320, 631, 708, 326, 380, 1002, 554, 1397, 
276, 758)), row.names = c(NA, -18L), class = c("tbl_df", "tbl", 
"data.frame"))

Solution

  • Lets try not to call our data "data", since this is a function in R!

    Using the data that I edited into your question.

    You can do what you would like by adding a geom_text that only looks at the data for positives.

    ggplot(datas, aes(fill=Status, y=Number, x=Antibody)) + 
     geom_bar(position="stack", stat="identity") +
     theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
           panel.spacing.x=unit(0.1, "lines") , panel.spacing.y=unit(0.1,"lines"),
           legend.position ="bottom") +
     facet_wrap(~Lab,nrow=4) + 
     scale_fill_brewer(palette = "Set2") +
     geom_text(data = data %>%
                filter(Status == "npos"), 
               aes(label = round(Number/number_tests, 3)),
               vjust = 0)
    

    Output of code


    DATA


    library(tidyverse)
    datas <- tibble(Lab = rep(paste0("lab_", 1:3), times = 3),
                 Antibody = rep(paste0("ab_", 1:3), each = 3)) %>%
     group_by(lab) %>%
     nest() %>% 
     mutate(number_tests = round(runif(1, 1000, 2100))) %>%
     unnest(data) %>%
     group_by(antibody) %>%
     nest() %>% 
     mutate(prop_pos = runif(n = 1)) %>% 
     unnest(data) %>% 
     ungroup() %>% 
     mutate(npos = map2_dbl(number_tests, prop_pos,
                            ~ rbinom(n = 1, size = (.x), prob = .y)),
            nneg = number_tests - npos) %>%
     pivot_longer(cols = c(npos, nneg), names_to = "Status", values_to = "Number")