Search code examples
rggplot2tidyrgeom-bar

How to use geom_bar to connect stacked-bar proportions if name categorial for bar is character


This is an extension to a previous answer of a question found here

Briefly @Jon Spring uses the following example code to produce a stacked bar plot with lines connecting each bar proportion between the two groups:

library(ggplot2)
set.seed(0)
data_bar <- data.frame(
  stringsAsFactors = F,
  Sample = rep(c("A", "B"), each = 10),
  Percentage = runif(20),
  Taxon = rep(1:10, by = 2)
)
library(tidyr)
ggplot() +
  geom_bar(data = data_bar,
           aes(x = Sample, y =Percentage, fill = Taxon),
           colour = 'white', width = 0.3, stat="identity") +
  geom_segment(data = tidyr::spread(data_bar, Sample, Percentage),
               colour = "white",
               aes(x = 1 + 0.3/2,
                   xend = 2 - 0.3/2,
                   y = cumsum(A),
                   yend = cumsum(B))) +
  theme(panel.background = element_rect(fill = "black"), # to make connecting points          
        panel.grid = element_blank())   

geom_seg example

While this is an elegant piece of code to address the issue of connecting the bar proportions, I am somehow not able to reproduce it once the bar proportion names are character strings instead on integer as above. Here is my code:

test.matrix<-matrix(c(70,120,65,140,13,68,46,294,52,410),ncol=2,byrow=TRUE)
rownames(test.matrix)<-c("BC.1","BC.2","GC","MO","EB")
colnames(test.matrix)<-c("12m","3m")
test.matrix <- data.frame(test.matrix)

ggplot() +
  geom_bar(data = test.matrix,
           aes(x = Var2, y =Freq, fill = Var1),
           colour = 'black', width = 0.3, stat="identity") +
  geom_segment(data = tidyr::spread(test.matrix, Var2, Freq),
               colour = "black",
               aes(x = 1 + 0.3/2,
                   xend = 2 - 0.3/2,
                   y = cumsum(`12m`),
                   yend = cumsum(`3m`))) +
  scale_fill_manual(values=c('BC.1'="gold",'BC.2'="yellowgreen",'GC'="navy",'MO'="royalblue",'EB'="orangered")) +
  theme(panel.background = element_rect(fill = "white"), panel.grid = element_blank())

geom_seg char

The result does not match the geom_segment lines to the bar proportions. Maybe it has sth to do with cumsum() using an alphabetic order of the strings, but I cannot figure out how to solve this - or its sth completely different...

So I have two questions:

  1. How can the bar proportions be connected if the proportions order has to be fixed? (a string vector or factor as 'names' for each value group or row)

  2. How can an additional geom_segment at the very bottom of each bar be generated connecting both lower ends of each bar with another?


Solution

    1. The issue is that you cumsummed in the wrong "direction" or order, i.e. you start cumsumming at BC.1 while in the bar chart it's on the top. This could simply be fixed by rearranging the dataset before cumulating. Therefore in my opinion it's best to do this outside of the plotting code so that you can easily check the data.

    2. To get another geom_segment at the bottom you can simply add a row to your data.

    library(tidyverse)
    
    test.matrix<-matrix(c(70,120,65,140,13,68,46,294,52,410),ncol=2,byrow=TRUE)
    rownames(test.matrix)<-c("BC.1","BC.2","GC","MO","EB")
    colnames(test.matrix)<-c("12m","3m")
    test.matrix <- data.frame(test.matrix)
    
    test.matrix <- test.matrix %>% 
      setNames(c("12m", "3m")) %>% 
      rownames_to_column(var = "Var1") %>% 
      pivot_longer(-Var1, names_to = "Var2", values_to = "Freq")
    
    test.matrix.wide <- tidyr::spread(test.matrix, Var2, Freq) %>% 
      arrange(desc(Var1)) %>% 
      mutate(y = cumsum(`12m`),
             yend = cumsum(`3m`)) %>% 
      add_row(y = 0, yend = 0)
    
    ggplot() +
      geom_bar(data = test.matrix,
               aes(x = Var2, y =Freq, fill = Var1),
               colour = 'black', width = 0.3, stat="identity") +
      geom_segment(data = test.matrix.wide,
                   colour = "black",
                   aes(x = 1 + 0.3/2,
                       xend = 2 - 0.3/2,
                       y = y,
                       yend = yend)) +
      scale_fill_manual(values=c('BC.1'="gold",'BC.2'="yellowgreen",'GC'="navy",'MO'="royalblue",'EB'="orangered")) +
      theme(panel.background = element_rect(fill = "white"), panel.grid = element_blank())