I am new to using the ggalluvial package. I presently am working with a dataset of donations that I would like to represent using an alluvial diagram as a medium. Below is a sample of the dataset that I am working with:
donor_ID recip_name donation_amt month_year
<chr> <chr> <dbl> <chr>
1 1 B, P 25 September 2019
2 2 S, B 27 July 2019
3 3 K, A 50 June 2019
4 1 H, K 100 April 2019
5 2 W, E 3 December 2019
6 3 S, B 9 August 2019
7 1 C, J 25 September 2019
8 2 B, J 50 October 2019
9 3 W, E 400 August 2019
10 1 S, B 20 December 2019
The output of dput() on this datset is as follows:
structure(list(donor_ID = c("1", "2", "3", "1", "2", "3", "1",
"2", "3", "1"), recip_name = c("B, P", "S, B", "K, A", "H, K",
"W, E", "S, B", "C, J", "B, J", "W, E", "S, B"), donation_amt = c(25,
27, 50, 100, 3, 9, 25, 50, 400, 20), month_year = c("September 2019",
"July 2019", "June 2019", "April 2019", "December 2019", "August 2019",
"September 2019", "October 2019", "August 2019", "December 2019"
)), class = "data.frame", row.names = c(NA, -10L))
I am looking to represent the choice made by individual donors of who receives (recip_name
) their donation may change from month to month (donor preference), whereas donor_ID
represents individual donors. The resulting alluvial diagram should show said changes between each month in a way that is also proportional to the total donation amounts (donation_amt
) moving between recipient. Below is the script I have written to accomplish this:
df$recip_name <- as.factor(df$recip_name)
df %>%
filter(transaction_dt < as.Date("2020-01-01")) %>%
select(donor_ID, recip_name, donation_amt, month_year) %>%
ggplot(aes(x = month_year, y = donation_amt, stratum = recip_name,
alluvium = donor_ID, fill = recip_name, label = recip_name)) +
scale_fill_brewer(type = "qual", palette = "Set2") +
geom_flow(stat = "alluvium", color = "darkgray") +
geom_stratum() +
theme_light() +
theme(legend.position = "bottom") +
ggtitle("Donor Preference")
Upon executing this R code, this is the resulting error I receive:
Error in f(...) :
Data is not in a recognized alluvial form (see `help('alluvial-data')` for details).
I have done research on what is already out there on issues with properly setting up data for use in ggalluvial, to no avail. How can I properly develop the desired alluvial diagram using this data?
Currently the errors thrown by the plot layers are less informative than those thrown by the alluvial structure tests themselves. The tests also use different terms: id
for alluvium
, key
for x
, and value
for stratum
. (I apologize for that! These will be changed in a future release.) Your data are trying to be in lodes (long) form, and the is_lodes_form()
test (below) says that there are duplicate id–axis pairings.
I didn't notice earlier, but there is indeed at least one duplicative pairing: There are two rows with donor_ID = 1
and month_year = September 2019
. Alluvial plots require that each alluvium (id) pass through each axis at most once. After removing this one and another, an alluvial plot does render (below). Presumably because this is only a sample of the data, the plot is sparse.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(stringr)
library(ggalluvial)
#> Loading required package: ggplot2
df <- structure(list(
donor_ID = c("1", "2", "3", "1", "2", "3", "1", "2", "3", "1"),
recip_name = c("B, P", "S, B", "K, A", "H, K", "W, E", "S, B", "C, J", "B, J", "W, E", "S, B"),
donation_amt = c(25, 27, 50, 100, 3, 9, 25, 50, 400, 20),
month_year = c("September 2019", "July 2019", "June 2019", "April 2019", "December 2019", "August 2019", "September 2019", "October 2019", "August 2019", "December 2019")
), class = "data.frame", row.names = c(NA, -10L))
df$recip_name <- as.factor(df$recip_name)
is_lodes_form(df, key = month_year, value = recip_name, id = donor_ID)
#> Duplicated id-axis pairings.
#> [1] FALSE
df %>%
slice(-c(7, 9)) %>%
mutate(month = match(str_remove(month_year, " 2019"), month.name)) %>%
ggplot(aes(x = month_year, y = donation_amt, stratum = recip_name,
alluvium = donor_ID, fill = recip_name, label = recip_name)) +
scale_fill_brewer(type = "qual", palette = "Set2") +
geom_flow(stat = "alluvium", color = "darkgray") +
geom_stratum() +
theme_light() +
theme(legend.position = "bottom") +
ggtitle("Donor Preference")
Created on 2022-01-30 by the reprex package (v2.0.1)
The plot is pretty sparse, presumably because this is only a sample of your data. And you'll have to do a few more things to clean up the plot, e.g. turn the character-valued month_year
into a factor or date.
If you want to distinguish donations to different recipients from the same donor, then perhaps the unit of observation you want to use is the interaction of donor_ID
and recip_name
. Passing that to the alluvium
aesthetic, recip_name
to stratum
, and donor_ID
to fill
might produce the plot you want.