Search code examples
rmatrixbinaryigraphadjacency-matrix

Adjacency Matrix for User in R


I have a data as follows -

user_id     post_id
24376261    204506440
98461       204446324
98461       203026202
98461       203031838
311542      204351465
875740      203031838

This data indicates that posts on which a user has comment in a website's article/post. Now, I need to create a matrix with user_id in rows and columns and values will be 1 if these users are connected through a blog post else 0. i.e the output I want will look like this -

user       24376261 98461   311542  875740
24376261    1       0       0       0
98461       0       1       0       1
311542      0       0       1       0
875740      0       1       0       1

How can I do this in R? I tried following Brian method from this question - Adjacency matrix in R But I get an R object of the following class -

> class(am)
[1] "dgCMatrix"
attr(,"package")

How can I convert this into a data.frame or something that can be exported to R?


Solution

  • Here's an approach that gets you your desired output:

    tcrossprod(table(mydf))
    #           user_id
    # user_id    98461 311542 875740 24376261
    #   98461        3      0      1        0
    #   311542       0      1      0        0
    #   875740       1      0      1        0
    #   24376261     0      0      0        1
    (tcrossprod(table(mydf)) != 0) + 0
    #           user_id
    # user_id    98461 311542 875740 24376261
    #   98461        1      0      1        0
    #   311542       0      1      0        0
    #   875740       1      0      1        0
    #   24376261     0      0      0        1
    

    If you want the result as a data.frame, you can wrap the output in as.data.frame.matrix.


    This is assuming that "mydf" is defined as:

    mydf <- structure(list(user_id = c(24376261L, 98461L, 98461L, 98461L, 
    311542L, 875740L), post_id = c(204506440L, 204446324L, 203026202L, 
    203031838L, 204351465L, 203031838L)), .Names = c("user_id", "post_id"), 
    class = "data.frame", row.names = c(NA, -6L))