Search code examples
rmatrixgraphigraphadjacency-matrix

Create an adjacency matrix or list based on shared properties from a data-frame


I have a dataframe of repeated items (rows) that in a sense describes items that share some property with each other. I would like to express this relation as a graph.

property node
red      A
red      B
red      C
blue     A
blue     D
purple   A
purple   B

A, B and C would be connected with each other since they share the red property. A and D would form a connection since they share the blue property. Furthermore, A and B share a purple property. We can weight elements that share more than one property for example, A and B share a purple property in addition to the red property.

My question is, how do I conveniently express this relationship using R and obtain an adjacency matrix or simply a list of edges.

matrix <- matrix(0,total_nodes,total_nodes) #initialize a matrix
for (i in property) {
   #some function to fill in the matrix 
}

diag(matrix) <- 0

Solution

  • Read your data:

    dta <- read.table(header = TRUE, stringsAsFactors = FALSE, 
              textConnection("property node
    red      A
    red      B
    red      C
    blue     A
    blue     D
    purple   A
    purple   B"))
    

    Create edges from your dataset by linking your data to itself on property:

    library(dplyr)
    
    # Create edges by linking the vertices to eachother using their properties
    dta <- full_join(dta, dta, c('property' = 'property')) %>% 
      # We no longer need property -> remove
      select(-property) %>% 
      # Dont allow self-loops
      filter(node.x != node.y) %>% 
      # Aggregate duplicate edges: vertices linked using multiple properties
      group_by(node.x, node.y) %>% 
      summarise(weight = n())
    

    Now that we have a data.frame with edges we can create the graph:

    library(igraph)
    # Create graph
    g <- graph_from_data_frame(dta, directed = TRUE)
    # Create adjacency matrix from graph
    M <- as_adjacency_matrix(g, attr = "weight")
    

    Another solution to get the adjacency matrix without using igraph would be:

    library(tidyr)
    M2 <- spread(dta, node.y, weight, fill = 0)