I want to display multiple bivariate distribution in R instead of the full multivariate distribution. The following code provided in the vignette of the alluvial package uses an alluvial plot to display the full multivariate distribution of (Class, Sex, Age, Survived) for the Titanic dataset.
require(dplyr)
require(alluvial)
tit <- as.data.frame(Titanic, stringsAsFactors = FALSE)
head(tit)
alluvial(tit[,1:4], freq=tit$Freq,
col = ifelse(tit$Survived == "Yes", "orange", "grey"),
border = ifelse(tit$Survived == "Yes", "orange", "grey"),
hide = tit$Freq == 0,
cex = 0.7)
Instead of vizualising the full multivariate distribution, I would like to visualise the bivariate distributions (Class,Sex), (Sex,Age), & (Age, Survived) using a single alluvial plot. The count of the three bivariate distributions are
tit%>%group_by(Class,Sex)%>%summarize(Freq=sum(Freq))%>%ungroup()
tit%>%group_by(Sex,Age)%>%summarize(Freq=sum(Freq))%>%ungroup()
tit%>%group_by(Age,Survived)%>%summarize(Freq=sum(Freq))%>%ungroup()
Do you know if it feasible using the alluvial package or an alternative one?
For this particular example, using an alluvial plot might seem dubious. But it fully makes sense when variables are ordered and when we want to visualise bivariate distributions of (var1,var2), (var2,var3),...
The set up three alluvial plots together you can do as follow.
Note that count
is a dplyr
more compact version of group_by
+ summarise
+ ungroup
.
library(dplyr)
library(alluvial)
tit <- as.data.frame(Titanic, stringsAsFactors = TRUE)
oldpar <- par(mfrow=c(1, 3)) # set up alignment
with(count(tit, Class, Sex , wt = Freq), alluvial(Class, Sex , freq = n))
with(count(tit, Sex , Age , wt = Freq), alluvial(Sex , Age , freq = n))
with(count(tit, Age , Survived, wt = Freq), alluvial(Age , Survived, freq = n))
par(oldpar) # reset par
It's a good habit to reset par
every time you modified it since it's a general option that may affect other parts of your code.