Search code examples
radjacency-matrix

How to convert a dyadic formatted dataset to an adjacency matrix?


I have been trying hard to convert the following dyadic df to an adjacency matrix, working with several different approaches (reshape, dcast, ...) but didn't get what I wanted so far (It's a very long df, which is why I only add some exemplary lines here):

cntry1   cntry2    var1
usa      canada      70
usa      bahamas     29
usa      cuba        39 
canada   bahamas     15
canada   cuba        35
cuba     bahamas     5 

I'd like to have the above df in the following format:

            usa  canada bahamas  cuba     
usa         0    70        29      39
canada     70     0        15      35
bahamas    29    15        0        5
cuba       39    35        5        0

If I understood the various packages right (I am quite new to R), I would need to convert it from long to wide, which -however- is usually only done with edgelists of one id variable. 

Using it for my case, I receive the error message "Aggregation function missing: defaulting to length", which tells me that there seem to be non-unique values? - after reducing it to unique values the adjacency matrix is either complete for the rows or the columns but not both.  

Is there any completely different approach you would recommend me to use?

Thanks a lot for your help!


Solution

  • Try:

     lvls <- unique(unlist(dat[,1:2]))[c(1,2,4,3)]
      dat[,1:2] <- lapply(dat[,1:2], function(x) factor(x, levels=lvls))
      r1 <- xtabs(var1~cntry2+cntry1, dat)
      r1[lower.tri(r1) & !r1] <- r1[upper.tri(r1) & !!r1]
      r1[upper.tri(r1) & !!r1] <- 0
    
     as.matrix(as.dist(r1)) #idea contributed by @alexis_laz
     #         usa canada bahamas cuba
     #usa       0     70      29   39
     #canada   70      0      15   35
     #bahamas  29     15       0    5
     #cuba     39     35       5    0
    

    Or

     library(igraph)
     res <-  get.adjacency(graph.edgelist(as.matrix(dat[,1:2]),directed=FALSE)) #using the original dataset
     res[lower.tri(res)] <- dat$var1
     res[upper.tri(res)] <- t(res)[upper.tri(res)]
     res
     #4 x 4 sparse Matrix of class "dgCMatrix"
     #        usa canada bahamas cuba
     #usa       .     70      29   39
     #canada   70      .      15   35
     #bahamas  29     15       .    5
     #cuba     39     35       5    .
    

    Update

    Assuming that you have a dataset (undirected) like this:

     dat <- structure(list(cntry1 = c("usa", "usa", "usa", "canada", "canada", 
     "cuba", "canada"), cntry2 = c("canada", "bahamas", "cuba", "bahamas", 
     "cuba", "bahamas", "usa"), var1 = c(70L, 29L, 39L, 15L, 35L, 
     5L, 40L)), .Names = c("cntry1", "cntry2", "var1"), class = "data.frame", row.names = c(NA, 
    -7L))
    
     lvls <- unique(unlist(dat[,1:2]))[c(1,2,4,3)]
     dat[,1:2] <- lapply(dat[,1:2], function(x) factor(x, levels=lvls))
     r1 <- xtabs(var1~cntry2+cntry1, dat)
     r2 <- t(r1)
     indx <- intersect(which(lower.tri(r2) & !!r2), which(lower.tri(r1) & !r1))
     r1[lower.tri(r1) & !r1] <- r2[indx]
    
     indx1 <- upper.tri(r1) & !r1
     r1[indx1] <- r2[indx1]
     r1
     #              cntry1
     #cntry2    usa canada bahamas cuba
     #usa       0     40      29   39
     #canada   70      0      15   35
     #bahamas  29     15       0    5
     #cuba     39     35       5    0
    

    Update No:2

     dat <- structure(list(cntry1 = c("usa", "usa", "usa", "canada", "canada", "cuba",  "canada"),
       cntry2 = c("canada", "bahamas", "cuba", "bahamas", "cuba", "bahamas", "usa"), 
      var1 = c(4.5L, 3L, 0.5L, 2L, 0L, 2L, 5.5L)), 
      .Names = c("cntry1", "cntry2", "var1"), class = "data.frame", row.names = c(NA, -7L))
    

    Change the values that are 0 in the var1 column to any other value not in the dataset

     dat$var1[!dat$var1] <- 0.01
     lvls <- unique(unlist(dat[,1:2]))[c(1,2,4,3)]
     dat[,1:2] <- lapply(dat[,1:2], function(x) factor(x, levels=lvls))
     r1 <- xtabs(var1~cntry2+cntry1, dat)
     r2 <- t(r1)
     indx <- intersect(which(lower.tri(r2) & !!r2), which(lower.tri(r1) & !r1))
     r1[lower.tri(r1) & !r1] <- r2[indx]
    
     indx1 <- upper.tri(r1) & !r1
     r1[indx1] <- r2[indx1]
     r1[r1==0.01] <- 0
      r1
      #         cntry1
     #cntry2   usa canada bahamas cuba
     # usa     0.0    5.5     3.0  0.5
     # canada  4.5    0.0     2.0  0.0
     # bahamas 3.0    2.0     0.0  2.0
     # cuba    0.5    0.0     2.0  0.0