Search code examples
rmatrixreshape

Reconstruct symmetric matrix from values in long-form


I have a tsv that looks like this (long-form):

  one   two   value
  a     b     30
  a     c     40
  a     d     20
  b     c     10
  b     d     05
  c     d     30

I'm trying to get this into a dataframe for R (or pandas)

    a  b  c  d 
a   00 30 40 20
b   30 00 10 05 
c   40 10 00 30
d   20 05 30 00

The problem is, in my tsv I only have a, b defined and not b,a. So I get a lot of NAs in my dataframe.

The final goal is to get a distance matrix to use in clustering. Any help would be appreciated.


Solution

  • An igraph solution where you read in the dataframe, with the value assumed as edge weights. You can then convert this to an adjacency matrix

    dat <- read.table(header=T, text=" one   two   value
      a     b     30
      a     c     40
      a     d     20
      b     c     10
      b     d     05
      c     d     30")
    
    library(igraph)
    
    # Make undirected so that graph matrix will be symmetric
    g <- graph.data.frame(dat, directed=FALSE)
    
    # add value as a weight attribute
    get.adjacency(g, attr="value", sparse=FALSE)
    #   a  b  c  d
    #a  0 30 40 20
    #b 30  0 10  5
    #c 40 10  0 30
    #d 20  5 30  0