Search code examples
rmatrixadjacency-matrix

Convert a DataFrame into Adjacency/Weights Matrix in R


I have a DataFrame, df.

n is a column denoting the number of groups in the x column.
x is a column containing the comma-separated groups.

df <- data.frame(n = c(2, 3, 2, 2), 
                 x = c("a, b", "a, c, d", "c, d", "d, b"))

> df
n        x
2     a, b
3  a, c, d
2     c, d
2     d, b

I would like to convert this DataFrame into a weights matrix where the row and column names are the unique values of the groups in df$x, and the elements represent the number of times each of the groups appear together in df$x.

The output should look like this:

m <- matrix(c(0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 2, 1, 1, 2, 0), nrow = 4, ncol = 4)
rownames(m) <- letters[1:4]; colnames(m) <- letters[1:4]

> m
  a b c d
a 0 1 1 1
b 1 0 0 1
c 1 0 0 2
d 1 1 2 0

Solution

  • Here's a very rough and probably pretty inefficient solution using tidyverse for wrangling and combinat to generate permutations.

    library(tidyverse)
    library(combinat)
    
    df <- data.frame(n = c(2, 3, 2, 2), 
                     x = c("a, b", "a, c, d", "c, d", "d, b"))
    
    df %>% 
        ## Parse entries in x into distinct elements
        mutate(split = map(x, str_split, pattern = ', '), 
               flat = flatten(split)) %>% 
        ## Construct 2-element subsets of each set of elements
        mutate(combn = map(flat, combn, 2, simplify = FALSE)) %>% 
        unnest(combn) %>% 
        ## Construct permutations of the 2-element subsets
        mutate(perm = map(combn, permn)) %>% 
        unnest(perm) %>% 
        ## Parse the permutations into row and column indices
        mutate(row = map_chr(perm, 1), 
               col = map_chr(perm, 2)) %>% 
        count(row, col) %>% 
        ## Long to wide representation
        spread(key = col, value = nn, fill = 0) %>% 
        ## Coerce to matrix
        column_to_rownames(var = 'row') %>% 
        as.matrix()