I have a data as follows -
user_id post_id
24376261 204506440
98461 204446324
98461 203026202
98461 203031838
311542 204351465
875740 203031838
This data indicates that posts on which a user has comment in a website's article/post. Now, I need to create a matrix with user_id in rows and columns and values will be 1 if these users are connected through a blog post else 0. i.e the output I want will look like this -
user 24376261 98461 311542 875740
24376261 1 0 0 0
98461 0 1 0 1
311542 0 0 1 0
875740 0 1 0 1
How can I do this in R? I tried following Brian method from this question - Adjacency matrix in R But I get an R object of the following class -
> class(am)
[1] "dgCMatrix"
attr(,"package")
How can I convert this into a data.frame or something that can be exported to R?
Here's an approach that gets you your desired output:
tcrossprod(table(mydf))
# user_id
# user_id 98461 311542 875740 24376261
# 98461 3 0 1 0
# 311542 0 1 0 0
# 875740 1 0 1 0
# 24376261 0 0 0 1
(tcrossprod(table(mydf)) != 0) + 0
# user_id
# user_id 98461 311542 875740 24376261
# 98461 1 0 1 0
# 311542 0 1 0 0
# 875740 1 0 1 0
# 24376261 0 0 0 1
If you want the result as a data.frame
, you can wrap the output in as.data.frame.matrix
.
This is assuming that "mydf" is defined as:
mydf <- structure(list(user_id = c(24376261L, 98461L, 98461L, 98461L,
311542L, 875740L), post_id = c(204506440L, 204446324L, 203026202L,
203031838L, 204351465L, 203031838L)), .Names = c("user_id", "post_id"),
class = "data.frame", row.names = c(NA, -6L))