Search code examples
rggplot2sankey-diagram

Sankey diagram in R: How to change the height (Y) of individual sections related to each node?


Problem

How can I change the height of each section/node of a Sankey diagram? I want to create something like Image 1 below where 'gender' section is small, then 'cause' section large and then 'age' section small again:

Image 1

My output is more like Image 2 where each section (Fuels, Sectors, End uses, Convertion devices) has the same height:

Image 2

Code:

library(ggplot2)
library(ggalluvial)
library(RColorBrewer)

dfs <- dftest[ , c("Hospital", "Paciente", "Terapia", "Unit")]
alpha <- 1
getPalette <- colorRampPalette(brewer.pal(12, "Set3"))
colourCount <- length(unique(dfs$Hospital))
ggplot(dfs,
       aes(axis1 = Hospital, axis2 = Paciente, axis3=Terapia)) +
  geom_alluvium(aes(fill = Hospital), 
                width = 1/12, alpha = alpha, knot.pos = 0.5) +
  geom_stratum(width = 1/20) +
  scale_x_continuous(breaks = 1:3, labels = c("Hospital", "Patient", "Therapy")) +
  scale_fill_manual(values = getPalette(colourCount)) +
  ggtitle("Teste") +
  theme_minimal() +
  theme( legend.position = "none", panel.grid.major = element_blank(),
         panel.grid.minor = element_blank(), axis.text.y = element_blank(),
         axis.text.x = element_text(size = 12, face = "bold"))

I thought I could create a sankey diagram similar to Image 1. Below you can find dput(dfs) for a made up dataset:

dput(dfs)
structure(list(Hospital = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("1", 
"2", "3", "4", "5"), class = "factor"), Paciente = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 
5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L
), .Label = c("21", "22", "23", "24", "25", "26", "27"), class = "factor"), 
    Terapia = structure(c(2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
    3L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 
    1L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
    3L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 
    1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("Adalimumab", 
    "Etanercept", "Infliximab", "Rituximab"), class = "factor"), 
    Unit = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), class = "data.frame", row.names = c(NA, 
-65L))

Can anyone please advise?


Solution

  • I think the ggalluvial package's geoms were not designed for free-floating sections. However, as its creator noted in the package vignette, the ggforce package has something similar, if the following look is what you are going for:

    plot

    Code used:

    library(ggforce)
    
    # transform dataframe into appropriate format
    dfs2 <- gather_set_data(dfs, 1:3)
    
    # define axis-width / sep parameters once here, to be used by
    # each geom layer in the plot
    aw <- 0.1
    sp <- 0.1
    
    ggplot(dfs2, 
           aes(x = x, id = id, split = y, value = Unit)) +
      geom_parallel_sets(aes(fill = Hospital), alpha = 0.3, 
                         axis.width = aw, sep = sp) +
      geom_parallel_sets_axes(axis.width = aw, sep = sp) +
      geom_parallel_sets_labels(colour = "white", 
                                angle = 0, size = 3,
                                axis.width = aw, sep = sp) +
      theme_minimal()
    

    Here are some demonstrations with different parameter values:

    demonstrations