Search code examples
rnodesedgesnetworkd3

Creating Nodes and Edges Dataframes from Tidy Dataframes


I have a data frame that's of this structure:

df <- data.frame(var1 = c(1,1,1,2,2,3,3,3,3),
                 cat1 = c("A","B","D","B","C","D","E","B","A"))`

> df
  var1 cat1
1    1    A
2    1    B
3    1    D
4    2    B
5    2    C
6    3    D
7    3    E
8    3    B
9    3    A

And I am looking to create both nodes and edges data frames from it, so that I can draw a network graph, using VisNetwork. This network will show the number/strength of connections between the different cat1 values, as grouped by the var1 value.

I have the nodes data frame sorted:

nodes <- data.frame(id = unique(df$cat1))
> nodes
  id
1  A
2  B
3  D
4  C
5  E

What I'd like help with is how to process df in the following manner: for each distinct value of var1 in df, tally up the group of nodes that are common to that value of var1 to give an edges dataframe that ultimately looks like the one below. Note that I'm not bothered about the direction of flow along the edges. Just that they are connected is all I need.

> edges
  from to value
1    A  B     2
2    A  D     2
3    A  E     1
4    B  C     1
5    B  D     2
6    B  E     1
7    D  E     1

With thanks in anticipation, Nevil

Update: I found here a similar problem, and have adapted that code to give, which is getting close to what I want, but not quite there...

    > df %>% group_by(var1) %>%
             filter(n()>=2) %>% group_by(var1) %>%
             do(data.frame(t(combn(.$cat1, 2,function(x) sort(x))), 
                           stringsAsFactors=FALSE))

# A tibble: 10 x 3
# Groups:   var1 [3]
    var1 X1    X2   
   <dbl> <chr> <chr>
 1    1. A     B    
 2    1. A     D    
 3    1. B     D    
 4    2. B     C    
 5    3. D     E    
 6    3. B     D    
 7    3. A     D    
 8    3. B     E    
 9    3. A     E    
10    3. A     B  

Solution

  • I don't know if there is already a suitable function to achieve this task. Here is a detailed procedure to do it. Whith this, you should be able to define you own function. Hope it helps!

    # create an adjacency matrix
    mat <- table(df)
    mat <- t(mat) %*% mat 
    as.table(mat) # look at your adjacency matrix
    # since the network is not directed, we can consider only the (strictly) upper triangular matrix 
    mat[lower.tri(mat, diag = TRUE)] <- 0
    as.table(mat) # look at the new adjacency matrix
    
    library(dplyr)
    edges <- as.data.frame(as.table(mat))
    edges <- filter(edges, Freq != 0)
    colnames(edges) <- c("from", "to", "value")
    edges <- arrange(edges, from)
    edges # output
    
    #  from to value
    #1    A  B     2
    #2    A  D     2
    #3    A  E     1
    #4    B  C     1
    #5    B  D     2
    #6    B  E     1
    #7    D  E     1