I have experience with making alluvial plots using the ggalluvial
package. However, I have run in to an issue where I am trying to create an alluvial plot with two different sources that converge onto 1 variable.
here is example data
library(dplyr)
library(ggplot2)
library(ggalluvial)
data <- data.frame(
unique_alluvium_entires = seq(1:10),
label_1 = c("A", "B", "C", "D", "E", rep(NA, 5)),
label_2 = c(rep(NA, 5), "F", "G", "H", "I", "J"),
shared_label = c("a", "b", "c", "c", "c", "c", "c", "a", "a", "b")
)
here is the code I use to make the plot
#prep the data
data <- data %>%
group_by(shared_label) %>%
mutate(freq = n())
data <- reshape2::melt(data, id.vars = c("unique_alluvium_entires", "freq"))
data$variable <- factor(data$variable, levels = c("label_1", "shared_label", "label_2"))
#ggplot
ggplot(data,
aes(x = variable, stratum = value, alluvium = unique_alluvium_entires,
y = freq, fill = value, label = value)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow() +
geom_stratum(color = "grey", width = 1/4, na.rm = TRUE) +
geom_text(stat = "stratum", size = 4) +
theme_void() +
theme(
axis.text.x = element_text(size = 12, face = "bold")
)
(apparently I cannot embed images yet)
As you can see, I can remove the NA
values, but the shared_label
does not properly "stack". Each unique row should stack on top of each other in the shared_label
column. This would also fix the sizing issue so that they are equal size along the y axis.
Any ideas how to fix this? I have tried ggsankey
but the same issue arises and I cannot remove NA
values. Any tips is greatly appreciated!
This plot is the expected result of the "flow" statistical transformation, which is the default for the "flow" graphical object. (That is, geom_flow()
= geom_flow(stat = "flow")
.) It looks like what you want is to specify the "alluvium" statistical transformation instead. Below i've used all your code but only copied and edited the ggplot()
call.
#ggplot
ggplot(data,
aes(x = variable, stratum = value, alluvium = unique_alluvium_entires,
y = freq, fill = value, label = value)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow(stat = "alluvium") + # <-- specify alternate stat
geom_stratum(color = "grey", width = 1/4, na.rm = TRUE) +
geom_text(stat = "stratum", size = 4) +
theme_void() +
theme(
axis.text.x = element_text(size = 12, face = "bold")
)
#> Warning: Removed 2 rows containing missing values (geom_text).
Created on 2021-12-10 by the reprex package (v2.0.1)