I have been trying hard to convert the following dyadic df to an adjacency matrix, working with several different approaches (reshape, dcast, ...) but didn't get what I wanted so far (It's a very long df, which is why I only add some exemplary lines here):
cntry1 cntry2 var1
usa canada 70
usa bahamas 29
usa cuba 39
canada bahamas 15
canada cuba 35
cuba bahamas 5
I'd like to have the above df in the following format:
usa canada bahamas cuba
usa 0 70 29 39
canada 70 0 15 35
bahamas 29 15 0 5
cuba 39 35 5 0
If I understood the various packages right (I am quite new to R), I would need to convert it from long to wide, which -however- is usually only done with edgelists of one id variable.
Using it for my case, I receive the error message "Aggregation function missing: defaulting to length", which tells me that there seem to be non-unique values? - after reducing it to unique values the adjacency matrix is either complete for the rows or the columns but not both.
Is there any completely different approach you would recommend me to use?
Thanks a lot for your help!
Try:
lvls <- unique(unlist(dat[,1:2]))[c(1,2,4,3)]
dat[,1:2] <- lapply(dat[,1:2], function(x) factor(x, levels=lvls))
r1 <- xtabs(var1~cntry2+cntry1, dat)
r1[lower.tri(r1) & !r1] <- r1[upper.tri(r1) & !!r1]
r1[upper.tri(r1) & !!r1] <- 0
as.matrix(as.dist(r1)) #idea contributed by @alexis_laz
# usa canada bahamas cuba
#usa 0 70 29 39
#canada 70 0 15 35
#bahamas 29 15 0 5
#cuba 39 35 5 0
Or
library(igraph)
res <- get.adjacency(graph.edgelist(as.matrix(dat[,1:2]),directed=FALSE)) #using the original dataset
res[lower.tri(res)] <- dat$var1
res[upper.tri(res)] <- t(res)[upper.tri(res)]
res
#4 x 4 sparse Matrix of class "dgCMatrix"
# usa canada bahamas cuba
#usa . 70 29 39
#canada 70 . 15 35
#bahamas 29 15 . 5
#cuba 39 35 5 .
Assuming that you have a dataset (undirected) like this:
dat <- structure(list(cntry1 = c("usa", "usa", "usa", "canada", "canada",
"cuba", "canada"), cntry2 = c("canada", "bahamas", "cuba", "bahamas",
"cuba", "bahamas", "usa"), var1 = c(70L, 29L, 39L, 15L, 35L,
5L, 40L)), .Names = c("cntry1", "cntry2", "var1"), class = "data.frame", row.names = c(NA,
-7L))
lvls <- unique(unlist(dat[,1:2]))[c(1,2,4,3)]
dat[,1:2] <- lapply(dat[,1:2], function(x) factor(x, levels=lvls))
r1 <- xtabs(var1~cntry2+cntry1, dat)
r2 <- t(r1)
indx <- intersect(which(lower.tri(r2) & !!r2), which(lower.tri(r1) & !r1))
r1[lower.tri(r1) & !r1] <- r2[indx]
indx1 <- upper.tri(r1) & !r1
r1[indx1] <- r2[indx1]
r1
# cntry1
#cntry2 usa canada bahamas cuba
#usa 0 40 29 39
#canada 70 0 15 35
#bahamas 29 15 0 5
#cuba 39 35 5 0
dat <- structure(list(cntry1 = c("usa", "usa", "usa", "canada", "canada", "cuba", "canada"),
cntry2 = c("canada", "bahamas", "cuba", "bahamas", "cuba", "bahamas", "usa"),
var1 = c(4.5L, 3L, 0.5L, 2L, 0L, 2L, 5.5L)),
.Names = c("cntry1", "cntry2", "var1"), class = "data.frame", row.names = c(NA, -7L))
Change the values that are 0
in the var1
column to any other value not in the dataset
dat$var1[!dat$var1] <- 0.01
lvls <- unique(unlist(dat[,1:2]))[c(1,2,4,3)]
dat[,1:2] <- lapply(dat[,1:2], function(x) factor(x, levels=lvls))
r1 <- xtabs(var1~cntry2+cntry1, dat)
r2 <- t(r1)
indx <- intersect(which(lower.tri(r2) & !!r2), which(lower.tri(r1) & !r1))
r1[lower.tri(r1) & !r1] <- r2[indx]
indx1 <- upper.tri(r1) & !r1
r1[indx1] <- r2[indx1]
r1[r1==0.01] <- 0
r1
# cntry1
#cntry2 usa canada bahamas cuba
# usa 0.0 5.5 3.0 0.5
# canada 4.5 0.0 2.0 0.0
# bahamas 3.0 2.0 0.0 2.0
# cuba 0.5 0.0 2.0 0.0