I have the raw data of different people working for different universities at the same time, e.g.:
UniA UniB UniC UniD
individual_A X NA X NA
individual_B NA X NA X
individual_C NA X NA NA
individual_D X X X NA
And I try to use this data to establish a weighted undirect network betweeen universities. In other words, I would like to generate an adjacency matrix corresponding to the below given example:
UniA UniB UniC UniD
UniA 0 1 2 0
UniB 1 1 1
UniC 0 0
UniD 0
How would this be possible in R. Any tips or pointers would be most appreciated.
Thank you in advance for your time and help.
EDIT: Can you help to reshape the data
position1 position2 position3 position4
individual_A UniA UniC NA NA
individual_B UniB UniD NA NA
individual_C UniB NA NA NA
individual_D UniA UniB UniC NA
I tried to use the package reshape melt() and cast() converting the data to the form like I showed before:
UniA UniB UniC UniD
individual_A X NA X NA
individual_B NA X NA X
individual_C NA X NA NA
individual_D X X X NA
However, the value in the raw data is actually string (uniA/ uniB....), the transform is not successful. please help.
A possible solution, with the assumptions that the UniB diagonal value is zero, not one.
Data
dat = read.table(header=T, text=" UniA UniB UniC UniD
individual_A X NA X NA
individual_B NA X NA X
individual_C NA X NA NA
individual_D X X X NA")
Calculation
out <- crossprod(!is.na(dat))
diag(out) <- 0
If you want the lower triangle to be zero
out[lower.tri(out)] <- 0
Explanation
The !is.na(dat)
creates a logical matrix describing whether the data are missing or not (internally this is equivalent to zeros and ones). You then calculate the cross product. You can overwrite the diagonal values using the assign diag(dat) <-
.
okay, re your comments, there appears to be two processes that are used to fill the adjacency matrix. 1) the off-diagonals record the number of individuals that attend each pair of universities 2) a diagonal is marked as non-zero, if it is the only university attended by an individual (although multiple individuals may attend it). I have assumed the value that it takes is the number of individuals who have it as their only attendance.
So following from before
d <- !is.na(dat)
out <- crossprod(d)
diag(out) <- 0
id <- rowSums(d)==1 # which individuals only attend one uni
mx <- max.col(d, "first") # if there is only one attended which uni?
tab <- table(mx[id])
diag(out)[as.numeric(names(tab))] <- tab
out
# UniA UniB UniC UniD
#UniA 0 1 2 0
#UniB 1 1 1 1
#UniC 2 1 0 0
#UniD 0 1 0 0
To reshape your data
library(reshape2)
dat$id <- rownames(dat)
m <- melt(dat, id="id", na.rm=TRUE)[-2]
table(m)