Search code examples
rdata-visualizationnetworkd3

Directed Graph using networkD3 and a data frame


I have included a data frame which depicts a very small subset of the data I am using. My aim is to construct an interactive network due to the large number of nodes in the data.

library(networkD3)    
screenName <- c("ZV8Lxypirmo2T8z", "Zwoodbutcher", "zX3GZYH7Ea5FKhx", "zXZK7fkzrpPpJdb", 
                "ZyaTheKing", "zzzcccbbbmmm")    
mention <- c("GianCavallotto:", "IanPTrait:", "JahovasWitniss:", "Veachtravis:", 
             "visecs:", "Charles_HRH:")    
n <- c(1L, 1L, 1L, 1L, 1L, 1L)   
data <- data.frame(screenName,mention,n)    
simpleNetwork(data)

The above code allows for the construction of an interactive undirected network graph. I came across the forceNetwork() function under the networkD3R library which might help here. But I do not really know how to convert the data.frame for its usage in this function. Thank You in advance!


Solution

  • The functions simpleNetwork() and forceNetwork() are designed to work differently.

    simpleNetwork() takes one data frame as its primary input, and by default assumes that the first column is the 'source' of each link and the that the second column is the 'target' of each link. It does not require a data frame describing nodes because it assumes the only nodes are those that are linked to something in the link data frame and creates the node list internally by determining the unique values in the links data frame.

    forceNetwork() is more powerful and flexible, but it requires you to pass two data frames, one for links and one for nodes. You pass to the parameter Nodes a data frame that contains a list of unique nodes. The parameters NodeID and Group are character values that define the name of the column in the nodes data frame that contains that information, e.g. NodeID = 'name' and Group = 'type'. The Group column in the nodes data frame is used to define the color of the nodes, and is not really necessary, but forceNetwork() requires it, so you can just make a column in the nodes data frame that has the same value for every row, e.g. 1.

    You can take the code you have above, and build the necessary data frames to use forceNetwork() like this (for instance)...

    library(networkD3)
    screenName <- c("ZV8Lxypirmo2T8z", "Zwoodbutcher", "zX3GZYH7Ea5FKhx", 
                    "zXZK7fkzrpPpJdb", "ZyaTheKing", "zzzcccbbbmmm")
    mention <- c("GianCavallotto:", "IanPTrait:", "JahovasWitniss:", "Veachtravis:", 
                 "visecs:", "Charles_HRH:")
    n <- c(1L, 1L, 1L, 1L, 1L, 1L)
    
    nodeFactors <- factor(sort(unique(c(screenName, mention))))
    nodes <- data.frame(name = nodeFactors, group = 1)
    
    screenName <- match(screenName, levels(nodeFactors)) - 1
    mention <- match(mention, levels(nodeFactors)) - 1
    links <- data.frame(screenName, mention, n)
    
    forceNetwork(Links = links, Nodes = nodes, Source = 'screenName', 
                 Target = 'mention', Value = 'n', NodeID = 'name', Group = 'group')