Background
I have been working on creating an alluvial plot (kind of Sankey diagram) using ggplot
and the ggalluvial package to visualize frequency differences over time and their origins.
As example, I have created a simple dataset of 100 imaginary patients that are screened for COVID-19. At baseline, all patients are negative for COVID-19. After let’s say 1 week, all patients are tested again: now, 30 patients are positive, 65 are negative and 5 have an inconclusive result. Yet another week later, the 30 positive patients remain positive, 10 patients go from negative to positive, and the others are negative.
data <- data.frame(analysis = as.factor(rep(c("time0", "time1", "time2"), each = 4)),
freq = rep(c(30, 10, 55, 5), 3),
track = rep(1:4, 3),
response = c("neg","neg","neg","neg", "pos", "neg", "neg", "inconc", "pos", "pos", "neg", "neg"))
# analysis freq track response
#1 time0 30 1 neg
#2 time0 10 2 neg
#3 time0 55 3 neg
#4 time0 5 4 neg
#5 time1 30 1 pos
#6 time1 10 2 neg
#7 time1 55 3 neg
#8 time1 5 4 inconc
#9 time2 30 1 pos
#10 time2 10 2 pos
#11 time2 55 3 neg
#12 time2 5 4 neg
Goal
The goal is to create an alluvial plot to visualize the ‘tracks’ (i.e., alluvia) of these patients over time and, thereby, visualize the origin of the results after two weeks. Something like:
Attempt
I managed to make the major part of the figure:
library(tidyverse)
library(ggalluvial)
ggplot(data, aes(x = analysis, stratum = response, alluvium = track, y = freq, fill = response), col = "black") +
geom_flow(stat = "alluvium") +
geom_stratum(alpha = .5) +
scale_fill_manual(values = c("grey", "green", "red"))
Question
However, I am not able to distinguish the strata from one another clearly. Now, they are all adjacent to one another, which leads to a completely 'filled' rectangle.
How do you space the strata/alluvia in an alluvial plot using the ggalluvial
package in R
?
The author of the ggalluvial package defines alluvial plots as:
You probably want to do a sankey plot, a reasonable package is: ggsankey