I have a dataframe of repeated items (rows) that in a sense describes items that share some property with each other. I would like to express this relation as a graph.
property node
red A
red B
red C
blue A
blue D
purple A
purple B
A, B
and C
would be connected with each other since they share the red
property. A
and D
would form a connection since they share the blue
property. Furthermore, A
and B
share a purple
property. We can weight elements that share more than one property for example, A
and B
share a purple
property in addition to the red
property.
My question is, how do I conveniently express this relationship using R and obtain an adjacency matrix or simply a list of edges.
matrix <- matrix(0,total_nodes,total_nodes) #initialize a matrix
for (i in property) {
#some function to fill in the matrix
}
diag(matrix) <- 0
Read your data:
dta <- read.table(header = TRUE, stringsAsFactors = FALSE,
textConnection("property node
red A
red B
red C
blue A
blue D
purple A
purple B"))
Create edges from your dataset by linking your data to itself on property:
library(dplyr)
# Create edges by linking the vertices to eachother using their properties
dta <- full_join(dta, dta, c('property' = 'property')) %>%
# We no longer need property -> remove
select(-property) %>%
# Dont allow self-loops
filter(node.x != node.y) %>%
# Aggregate duplicate edges: vertices linked using multiple properties
group_by(node.x, node.y) %>%
summarise(weight = n())
Now that we have a data.frame with edges we can create the graph:
library(igraph)
# Create graph
g <- graph_from_data_frame(dta, directed = TRUE)
# Create adjacency matrix from graph
M <- as_adjacency_matrix(g, attr = "weight")
Another solution to get the adjacency matrix without using igraph
would be:
library(tidyr)
M2 <- spread(dta, node.y, weight, fill = 0)