Search code examples
rsankey-diagram

Sankey Diagram Blank in R


I am attempting to produce a Sankey plot in R following the example from https://www.youtube.com/watch?v=S6me1r6RI4I but I am struggling with troubleshooting my input data. I am not getting any error messages, only a blank plot.

First, my code and data are available at the following link: https://github.com/CLTeed/cautious-octo-enigma

library(d3Network)

Key<-read.csv(file="Cat_Key.csv", header=TRUE)
Sankey_data<-read.csv(file="Sankey_data.csv", header=TRUE)

#Convert to numeric
Sankey_data2 <- as.data.frame(sapply(Sankey_data, as.numeric))

#Remove any NA values
Sankey_data3 <- Sankey_data2[!is.na(Sankey_data2$Value), ]

#Convert Key to numeric
Key$Num <- sapply(Key$Num, as.numeric)
#drop the extra Key column
Key1 <- Key[, -ncol(Key)]

#Generate the plot
sankeyNetwork(Links = Sankey_data3, Nodes = Key1, Source = "Source",
              Target = "Sink", Value = "Value", NodeID = "Name", 
              iterations = 32)

I first checked that my indexing was correct and that it was zero-indexed. When I tried adjusting it, I got an error that it was not zero-indexed, so I think I did it right the first time. (This is the only error message I've received, and I'm not getting it with the code above) I modified my input data so that my Source, Target, and Values are all numeric. I made the numbers in Key numeric. I removed any rows which were duplicated. I removed any rows which had an NA value for the column "Value". I tried different numbers of iterations (up to 32). I used a dummy dataset, which worked. I've tried viewing in a new window and viewing in a Zoom window. I'm really not sure what to try next.

Here's what my data looks like:

> str(Sankey_data3)
'data.frame':   178 obs. of  3 variables:
 $ Source: int  0 1 2 3 4 5 6 7 8 9 ...
 $ Value : num  3 3 2 1 4 1 1 1 2 1 ...
 $ Sink  : int  48 48 48 48 48 48 48 49 50 51 ...

> str(Key)
'data.frame':   166 obs. of  3 variables:
 $ Num     : num  0 1 2 3 4 5 6 7 8 9 ...
 $ Name    : chr  "L-/M+:L+/M-" "L-/M+:L+M+" "L-:L-/M+" "L-:M+" ...
 $ Category: chr  "HisCat SP" "HisCat SP" "HisCat SP" "HisCat SP" ...

I'd appreciate any other suggestions for what to try!


Solution

  • Here you go. There was an NA in your source column. This was the exact issue:

    # Check if all Source and Sink values have corresponding Num values in Key
    all(unique(c(Sankey_data3$Source, Sankey_data3$Sink)) %in% Key$Num)
    

    This returned false meaning there were Source values (NA) which were not in keys. It seems like sankeyNetwork does not handle this case and it fails silently.

    Correct Code

    library(networkD3)
    library(dplyr)
    library(tidyr)
    
    Key<-read.csv(file="Cat_Key.csv", header=TRUE) %>% select(Num, Name) #select important cols
    Sankey_data<-read.csv(file="Sankey_data.csv", header=TRUE) %>% drop_na() #Remove any NA values
    
    #Generate the plot
    sankeyNetwork(Links = Sankey_data, Nodes = Key, Source = "Source",
                  Target = "Sink", Value = "Value", NodeID = "Name", 
                  iterations = 0)
    

    The plot looks weird in R, so watch it in the browser!

    out