I am attempting to produce a Sankey plot in R following the example from https://www.youtube.com/watch?v=S6me1r6RI4I but I am struggling with troubleshooting my input data. I am not getting any error messages, only a blank plot.
First, my code and data are available at the following link: https://github.com/CLTeed/cautious-octo-enigma
library(d3Network)
Key<-read.csv(file="Cat_Key.csv", header=TRUE)
Sankey_data<-read.csv(file="Sankey_data.csv", header=TRUE)
#Convert to numeric
Sankey_data2 <- as.data.frame(sapply(Sankey_data, as.numeric))
#Remove any NA values
Sankey_data3 <- Sankey_data2[!is.na(Sankey_data2$Value), ]
#Convert Key to numeric
Key$Num <- sapply(Key$Num, as.numeric)
#drop the extra Key column
Key1 <- Key[, -ncol(Key)]
#Generate the plot
sankeyNetwork(Links = Sankey_data3, Nodes = Key1, Source = "Source",
Target = "Sink", Value = "Value", NodeID = "Name",
iterations = 32)
I first checked that my indexing was correct and that it was zero-indexed. When I tried adjusting it, I got an error that it was not zero-indexed, so I think I did it right the first time. (This is the only error message I've received, and I'm not getting it with the code above) I modified my input data so that my Source, Target, and Values are all numeric. I made the numbers in Key numeric. I removed any rows which were duplicated. I removed any rows which had an NA value for the column "Value". I tried different numbers of iterations (up to 32). I used a dummy dataset, which worked. I've tried viewing in a new window and viewing in a Zoom window. I'm really not sure what to try next.
Here's what my data looks like:
> str(Sankey_data3)
'data.frame': 178 obs. of 3 variables:
$ Source: int 0 1 2 3 4 5 6 7 8 9 ...
$ Value : num 3 3 2 1 4 1 1 1 2 1 ...
$ Sink : int 48 48 48 48 48 48 48 49 50 51 ...
> str(Key)
'data.frame': 166 obs. of 3 variables:
$ Num : num 0 1 2 3 4 5 6 7 8 9 ...
$ Name : chr "L-/M+:L+/M-" "L-/M+:L+M+" "L-:L-/M+" "L-:M+" ...
$ Category: chr "HisCat SP" "HisCat SP" "HisCat SP" "HisCat SP" ...
I'd appreciate any other suggestions for what to try!
Here you go. There was an NA
in your source column.
This was the exact issue:
# Check if all Source and Sink values have corresponding Num values in Key
all(unique(c(Sankey_data3$Source, Sankey_data3$Sink)) %in% Key$Num)
This returned false meaning there were Source values (NA) which were not in keys. It seems like sankeyNetwork
does not handle this case and it fails silently.
library(networkD3)
library(dplyr)
library(tidyr)
Key<-read.csv(file="Cat_Key.csv", header=TRUE) %>% select(Num, Name) #select important cols
Sankey_data<-read.csv(file="Sankey_data.csv", header=TRUE) %>% drop_na() #Remove any NA values
#Generate the plot
sankeyNetwork(Links = Sankey_data, Nodes = Key, Source = "Source",
Target = "Sink", Value = "Value", NodeID = "Name",
iterations = 0)
The plot looks weird in R, so watch it in the browser!