Search code examples
rvisualizationcategorical-data

Can we make an alluvial plot in R to display multiple bivariate distributions instead of the full multivariate distribution?


I want to display multiple bivariate distribution in R instead of the full multivariate distribution. The following code provided in the vignette of the alluvial package uses an alluvial plot to display the full multivariate distribution of (Class, Sex, Age, Survived) for the Titanic dataset.

require(dplyr)
require(alluvial)
tit <- as.data.frame(Titanic, stringsAsFactors = FALSE)
head(tit)
alluvial(tit[,1:4], freq=tit$Freq,
     col = ifelse(tit$Survived == "Yes", "orange", "grey"),
     border = ifelse(tit$Survived == "Yes", "orange", "grey"),
     hide = tit$Freq == 0,
     cex = 0.7)

enter image description here

Instead of vizualising the full multivariate distribution, I would like to visualise the bivariate distributions (Class,Sex), (Sex,Age), & (Age, Survived) using a single alluvial plot. The count of the three bivariate distributions are

tit%>%group_by(Class,Sex)%>%summarize(Freq=sum(Freq))%>%ungroup() 
tit%>%group_by(Sex,Age)%>%summarize(Freq=sum(Freq))%>%ungroup()
tit%>%group_by(Age,Survived)%>%summarize(Freq=sum(Freq))%>%ungroup()

Do you know if it feasible using the alluvial package or an alternative one?

For this particular example, using an alluvial plot might seem dubious. But it fully makes sense when variables are ordered and when we want to visualise bivariate distributions of (var1,var2), (var2,var3),...


Solution

  • The set up three alluvial plots together you can do as follow.

    Note that count is a dplyr more compact version of group_by + summarise + ungroup.

    library(dplyr)
    library(alluvial)
    
    tit <- as.data.frame(Titanic, stringsAsFactors = TRUE)
    
    oldpar <- par(mfrow=c(1, 3)) # set up alignment
    
    with(count(tit, Class, Sex     , wt = Freq), alluvial(Class, Sex     , freq = n))
    with(count(tit, Sex  , Age     , wt = Freq), alluvial(Sex  , Age     , freq = n))
    with(count(tit, Age  , Survived, wt = Freq), alluvial(Age  , Survived, freq = n))
    
    par(oldpar) # reset par
    

    enter image description here

    It's a good habit to reset par every time you modified it since it's a general option that may affect other parts of your code.