Search code examples
rplotcustomizationsankey-diagramr-highcharter

Sankey diagram in R (highcharter) customisation - how to change node order, labels, and colors


I want to get a Sankey diagram in R with highcharter, with 3 different columns showing how people go from low to high measurements through 3 different years.

This is a mock table showing how I organised my table, as well as the code for the Sankey

library(highcharter)
dat <- cbind(c("1.Low", "1.Low","1.High", "1.High", "2.Low", "2.High", "2.High"), 
             c("2.Low", "2.High", "2.Low", "2.High", "3.High", "3.Low", "3.High"), 
             c(5, 10, 15, 5, 1, 10, 15))
dat<- as.data.frame(dat)
colnames(dat)<- c("from", "to", "weight")
dat$weight<- as.numeric(dat$weight)

hchart(dat, "sankey")

which gets me this Sankey diagram:

Sankey diagram with 3 columns of "high" and "low"

I want to do 3 things:

  1. Change the labels to remove the numbers in front of the labels. The reason why I added them in was to differentiate between the different columns, or it would assume the diagram was only 2 columns (low and high), but i dont want that in my final diagram.

  2. **Reorder ** the "high" and "low" in the last column.

  3. Make all "highs" the same color, and all "lows" the same color - is that possible?

    • I've seen code that lets you determine the colors of each category individually while setting the weight as well, but i would like some other way such as by name because my actual dataset had more than 100 rows and it's not feasible to set each combination individually.

so far I've been trying to fiddle with the highcharter elements but I find the documentation very confusing and without useful examples for Sankey.

I've tried these fields, but it doesn't work. Any and all ideas appreciated.

hchart(dat, "sankey") %>% 
  hc_add_theme(hc_theme_ggplot2()) %>%
  hc_plotOptions(series = list(dataLabels = list( style = list(fontSize = "10px")))) %>% 
  hc_plotOptions(sankey = list(colorByPoint = FALSE,
                               curveFactor = 0.5,
                               linkOpacity = 0.33)) %>% 
  hc_add_series(nodes= list(id = '1.High', color = "green"),
                       list(id = '1.Low', color = "blue"),
                       list(id = '2.High', color = "green"),
                       list(id = '2.Low', color = "blue"),
                       list(id = '3.High', color = "green"),
                       list(id = '3.Low', color = "blue")) 

Solution

  • Here is one approach to achieve your desired result.

    1. To fix the order, reorder your dataset by to such that the "low"s come first, then by from such that the "low"s come first.
    2. As you already tried you could fix both the colors and the labels via the nodes= attribute. However, instead of setting these manually for each node you could use lapply to create the list of individual node options.
    dat <- data.frame(
      c("1.Low", "1.Low", "1.High", "1.High", "2.Low", "2.High", "2.High"),
      c("2.Low", "2.High", "2.Low", "2.High", "3.High", "3.Low", "3.High"),
      c(5, 10, 15, 5, 1, 10, 15)
    )
    colnames(dat) <- c("from", "to", "weight")
    
    library(highcharter)
    
    dat <- dat[order(
      gsub("\\d+\\.\\s?", "", dat$to),
      gsub("\\d+\\.\\s?", "", dat$from),
      decreasing = TRUE
    ), ]
    
    nodes <- unique(c(dat$from, dat$to)) |>
      lapply(\(x) {
        list(
          id = x,
          color = if (grepl("High", x)) "green" else "blue",
          name = gsub("\\d+\\.\\s?", "", x)
        )
      })
    
    highchart() %>%
      hc_add_series(
        data = dat, type = "sankey",
        hcaes(from = from, to = to, weight = weight),
        nodes = nodes
      ) |>
      hc_plotOptions(
        series = list(dataLabels = list(style = list(fontSize = "10px"))),
        sankey = list(
          curveFactor = 0.5,
          linkOpacity = 0.33
        )
      )
    

    enter image description here