This is an extension to a previous answer of a question found here
Briefly @Jon Spring uses the following example code to produce a stacked bar plot with lines connecting each bar proportion between the two groups:
library(ggplot2)
set.seed(0)
data_bar <- data.frame(
stringsAsFactors = F,
Sample = rep(c("A", "B"), each = 10),
Percentage = runif(20),
Taxon = rep(1:10, by = 2)
)
library(tidyr)
ggplot() +
geom_bar(data = data_bar,
aes(x = Sample, y =Percentage, fill = Taxon),
colour = 'white', width = 0.3, stat="identity") +
geom_segment(data = tidyr::spread(data_bar, Sample, Percentage),
colour = "white",
aes(x = 1 + 0.3/2,
xend = 2 - 0.3/2,
y = cumsum(A),
yend = cumsum(B))) +
theme(panel.background = element_rect(fill = "black"), # to make connecting points
panel.grid = element_blank())
While this is an elegant piece of code to address the issue of connecting the bar proportions, I am somehow not able to reproduce it once the bar proportion names are character strings instead on integer as above. Here is my code:
test.matrix<-matrix(c(70,120,65,140,13,68,46,294,52,410),ncol=2,byrow=TRUE)
rownames(test.matrix)<-c("BC.1","BC.2","GC","MO","EB")
colnames(test.matrix)<-c("12m","3m")
test.matrix <- data.frame(test.matrix)
ggplot() +
geom_bar(data = test.matrix,
aes(x = Var2, y =Freq, fill = Var1),
colour = 'black', width = 0.3, stat="identity") +
geom_segment(data = tidyr::spread(test.matrix, Var2, Freq),
colour = "black",
aes(x = 1 + 0.3/2,
xend = 2 - 0.3/2,
y = cumsum(`12m`),
yend = cumsum(`3m`))) +
scale_fill_manual(values=c('BC.1'="gold",'BC.2'="yellowgreen",'GC'="navy",'MO'="royalblue",'EB'="orangered")) +
theme(panel.background = element_rect(fill = "white"), panel.grid = element_blank())
The result does not match the geom_segment lines to the bar proportions. Maybe it has sth to do with cumsum()
using an alphabetic order of the strings, but I cannot figure out how to solve this - or its sth completely different...
So I have two questions:
How can the bar proportions be connected if the proportions order has to be fixed? (a string vector or factor as 'names' for each value group or row)
How can an additional geom_segment at the very bottom of each bar be generated connecting both lower ends of each bar with another?
The issue is that you cumsum
med in the wrong "direction" or order, i.e. you start cumsum
ming at BC.1
while in the bar chart it's on the top. This could simply be fixed by rearranging the dataset before cumulating. Therefore in my opinion it's best to do this outside of the plotting code so that you can easily check the data.
To get another geom_segment
at the bottom you can simply add a row to your data.
library(tidyverse)
test.matrix<-matrix(c(70,120,65,140,13,68,46,294,52,410),ncol=2,byrow=TRUE)
rownames(test.matrix)<-c("BC.1","BC.2","GC","MO","EB")
colnames(test.matrix)<-c("12m","3m")
test.matrix <- data.frame(test.matrix)
test.matrix <- test.matrix %>%
setNames(c("12m", "3m")) %>%
rownames_to_column(var = "Var1") %>%
pivot_longer(-Var1, names_to = "Var2", values_to = "Freq")
test.matrix.wide <- tidyr::spread(test.matrix, Var2, Freq) %>%
arrange(desc(Var1)) %>%
mutate(y = cumsum(`12m`),
yend = cumsum(`3m`)) %>%
add_row(y = 0, yend = 0)
ggplot() +
geom_bar(data = test.matrix,
aes(x = Var2, y =Freq, fill = Var1),
colour = 'black', width = 0.3, stat="identity") +
geom_segment(data = test.matrix.wide,
colour = "black",
aes(x = 1 + 0.3/2,
xend = 2 - 0.3/2,
y = y,
yend = yend)) +
scale_fill_manual(values=c('BC.1'="gold",'BC.2'="yellowgreen",'GC'="navy",'MO'="royalblue",'EB'="orangered")) +
theme(panel.background = element_rect(fill = "white"), panel.grid = element_blank())