Search code examples
rsocial-networkingadjacency-matrix

How to calculate adjacency matrix from raw data which is non-numeric in nature in R?


I have the raw data of different people working for different universities at the same time, e.g.:

                UniA  UniB  UniC  UniD
individual_A    X     NA     X     NA
individual_B    NA     X     NA     X
individual_C    NA     X     NA    NA
individual_D    X      X      X    NA

And I try to use this data to establish a weighted undirect network betweeen universities. In other words, I would like to generate an adjacency matrix corresponding to the below given example:

       UniA UniB UniC UniD
UniA     0    1    2    0
UniB          1    1    1
UniC               0    0 
UniD                    0

How would this be possible in R. Any tips or pointers would be most appreciated.

Thank you in advance for your time and help.


EDIT: Can you help to reshape the data

              position1   position2  position3 position4
individual_A   UniA        UniC          NA       NA
individual_B   UniB        UniD          NA       NA
individual_C   UniB        NA            NA       NA
individual_D   UniA        UniB          UniC     NA

I tried to use the package reshape melt() and cast() converting the data to the form like I showed before:

                UniA  UniB  UniC  UniD
individual_A    X     NA     X     NA
individual_B    NA     X     NA     X
individual_C    NA     X     NA    NA
individual_D    X      X      X    NA

However, the value in the raw data is actually string (uniA/ uniB....), the transform is not successful. please help.


Solution

  • A possible solution, with the assumptions that the UniB diagonal value is zero, not one.

    Data

    dat = read.table(header=T, text="                UniA  UniB  UniC  UniD
    individual_A    X     NA     X     NA
    individual_B    NA     X     NA     X
    individual_C    NA     X     NA    NA
    individual_D    X      X      X    NA")
    

    Calculation

    out <- crossprod(!is.na(dat))
    diag(out) <- 0
    

    If you want the lower triangle to be zero

    out[lower.tri(out)] <- 0
    

    Explanation

    The !is.na(dat) creates a logical matrix describing whether the data are missing or not (internally this is equivalent to zeros and ones). You then calculate the cross product. You can overwrite the diagonal values using the assign diag(dat) <-.


    okay, re your comments, there appears to be two processes that are used to fill the adjacency matrix. 1) the off-diagonals record the number of individuals that attend each pair of universities 2) a diagonal is marked as non-zero, if it is the only university attended by an individual (although multiple individuals may attend it). I have assumed the value that it takes is the number of individuals who have it as their only attendance.

    So following from before

    d <- !is.na(dat)
    out <- crossprod(d)
    diag(out) <- 0
    
    id <- rowSums(d)==1 # which individuals only attend one uni
    mx <- max.col(d, "first")  # if there is only one attended which uni?
    tab <- table(mx[id])
    diag(out)[as.numeric(names(tab))] <- tab
    out
    #     UniA UniB UniC UniD
    #UniA    0    1    2    0
    #UniB    1    1    1    1
    #UniC    2    1    0    0
    #UniD    0    1    0    0
    

    To reshape your data

    library(reshape2) 
    dat$id <- rownames(dat) 
    m <- melt(dat, id="id", na.rm=TRUE)[-2] 
     table(m)