Search code examples
ggplot2reshape2

Stack bars with percentages and values shown


Here is my dataframe - data_long1

 data.frame(
       value = c(88, 22, 100, 12, 55, 17, 10, 2, 2),
     Subtype = as.factor(c("lung","prostate",
                           "oesophagus","lung","prostate","oesophagus","lung",
                           "prostate","oesophagus")),
    variable = as.factor(c("alive","alive",
                           "alive","dead","dead","dead","uncertain","uncertain",
                           "uncertain"))
)

The following code gives me a nice graph that I want, with all the values displayed, but none in percentages.enter image description here

ggplot(data_long1, aes(x = Subtype, y = value, fill = variable)) + geom_bar(stat = "identity") + 
geom_text(aes(label= value), size = 3, hjust = 0.1, vjust = 2, position = "stack")

What I am looking for is a stacked bar chart with The actual values displayed on the Y Axis not percentages(like previous graph) BUT also a percentage figure displayed on each subsection of the actual Bar Chart. I try this code and get a meaningless graph with every stack being 33.3%.

data_long1 %>% count(Subtype, variable) %>% group_by(Subtype) %>% mutate(pct= prop.table(n) * 100) %>% ggplot() + aes(x = Subtype, y = variable, fill=variable) + 
geom_bar(stat="identity") + ylab("Number of Patients") + 
geom_text(aes(label=paste0(sprintf("%1.1f", pct),"%")), position=position_stack(vjust=0.5)) + ggtitle("My Tumour Sites") + theme_bw()  

I cannot seem to find a way to use the mutate function to resolve this problem. Please help.


Solution

  • I would pre-compute the summaries you want. Here is the proportion within each subtype:

    data_long2 <- data_long1 %>% 
      group_by(Subtype) %>% 
      mutate(proportion = value / sum(value))
    
    ggplot(data_long2, aes(x = Subtype, y = value, fill = variable)) + 
      geom_bar(stat = "identity") + 
      geom_text(aes(label= sprintf('%0.0f%%', proportion * 100)), size = 3, hjust = 0.1, vjust = 2, position = "stack")
    

    enter image description here

    You can also get the proportion across all groups and types simply by removing the group_by statement:

    data_long2 <- data_long1 %>% 
      mutate(proportion = value / sum(value))
    
    ggplot(data_long2, aes(x = Subtype, y = value, fill = variable)) + 
      geom_bar(stat = "identity") + 
      geom_text(aes(label= sprintf('%0.0f%%', proportion * 100)), size = 3, hjust = 0.1, vjust = 2, position = "stack")
    

    enter image description here