I would like modify an existing sankey plot using ggplot2
and ggalluvial
to make it more appealing
my example is from https://corybrunson.github.io/ggalluvial/articles/ggalluvial.html
library(ggplot2)
library(ggalluvial)
data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))
ggplot(vaccinations,
aes(x = survey, stratum = response, alluvium = subject,
y = freq,
fill = response, label = response)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow() +
geom_stratum(alpha = .5) +
geom_text(stat = "stratum", size = 3) +
theme(legend.position = "none") +
ggtitle("vaccination survey responses at three points in time")
Created on 2020-10-01 by the reprex package (v0.3.0)
Now, I would like to change this plot that it looks similar to a plot from https://sciolisticramblings.wordpress.com/2018/11/23/sankey-charts-the-new-pie-chart/, i.e. 1. change absolute to relative values (percentage) 2. add percentage labels and 3. apply partial fill (e.g. "missing" and "never")
My approach:
I think I could change the axis to percentage with something like: scale_y_continuous(label = scales::percent_format(scale = 100))
However, I am not sure about step 2. and 3.
This could be achieved like so:
Changing to percentages could be achieved by adding a new column to your df with the percentage shares by survey, which can then be mapped on y
instead of freq
.
To get nice percentage labels you can make use of scale_y_continuous(label = scales::percent_format())
For the partial filling you can map e.g. response %in% c("Missing", "Never")
on fill
(which gives TRUE
for "Missing" and "Never") and set the fill colors via scale_fill_manual
The percentages of each stratum can be added to the label via label = paste0(..stratum.., "\n", scales::percent(..count.., accuracy = .1))
in geom_text
where I make use of the variables ..stratum..
and ..count..
computed by stat_stratum
.
library(ggplot2)
library(ggalluvial)
library(dplyr)
data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))
vaccinations <- vaccinations %>%
group_by(survey) %>%
mutate(pct = freq / sum(freq))
ggplot(vaccinations,
aes(x = survey, stratum = response, alluvium = subject,
y = pct,
fill = response %in% c("Missing", "Never"),
label = response)) +
scale_x_discrete(expand = c(.1, .1)) +
scale_y_continuous(label = scales::percent_format()) +
scale_fill_manual(values = c(`TRUE` = "cadetblue1", `FALSE` = "grey50")) +
geom_flow() +
geom_stratum(alpha = .5) +
geom_text(aes(label = paste0(..stratum.., "\n", scales::percent(..count.., accuracy = .1))), stat = "stratum", size = 3) +
theme(legend.position = "none") +
ggtitle("vaccination survey responses at three points in time")