Search code examples
rggplot2visualizationsankey-diagram

How to make a Sankey Diagram in R using this structure of data


I have a particular structured data set on student performance across classes and into which achievement cohort they fell into. I want to create a sankey diagram that visualizes how students achievement cohorts changed across several classes. My data looks like this:

Course     St_ID    Achievement
Eng101     St_A     Top third
Eng101     St_B     Top third
Eng101     St_C     Middle third
Eng101     St_D     Middle third    
Eng101     St_E     Bottom third
Eng101     St_F     Bottom third
Calc101    St_A     Top third
Calc101    St_B     Bottom third
Calc101    St_C     Bottom third
Calc101    St_D     Top third
Calc101    St_E     Middle third
Calc101    St_F     Middle third
Hist101    St_A     Bottom third
Hist101    St_B     Bottom third
Hist101    St_C     Middle third
Hist101    St_D     Top third
Hist101    St_E     Middle third
Hist101    St_F     Top third

And I want the sankey diagram to look something like this (not drawn to scale)enter image description here:

How can I do that?


Solution

  • Here's a way to create this type of plot with ggalluvial

    library(ggalluvial)
    
    ggplot(df,
           aes(x = Course,
               label = Achievement,
               stratum = Achievement,
               alluvium = St_ID,
               fill = Achievement)) +
      geom_flow(stat = 'alluvium',
                lode.guidance = 'frontback') +
      geom_stratum()
    

    Created on 2023-06-27 with reprex v2.0.2