Search code examples
rmongodbmatrixsparse-matrixdata-cleaning

R list to sparse matrix


I tried to load data from mongodb with R, but I find the query results was unstructured-data, it's a messy list, the results looks like this:

 df
[[1]]
list()
[[2]]
[[2]][[1]]
[1] "vector1"
[[2]][[2]]
[1] "vector2"
[[3]]
list()
[[4]]
list()
[[5]]
list()
[[6]]
[[6]][[1]]
[1] "vector1"
[[6]][[2]]
[1] "vector2"
[[6]][[3]]
[1] "vector3"

I tried to convert the list to matrix, just like this:

vector1 vector2 vector3
   0       0       0
   1       1       0
   0       0       0
   0       0       0
   0       0       0
   1       1       1 

I try to use SparseMatrix() and sapply(), but all failed. And I had to manually create the dataframe above to make the question clear.


Solution

  • One option is mtabulate from qdapTools

    library(qdapTools)
    mtabulate(df)
    #   vector1 vector2 vector3
    #1       0       0       0
    #2       1       1       0
    #3       0       0       0
    #4       0       0       0
    #5       0       0       0
    #6       1       1       1
    

    Or if we need a base R option, we can loop over the list elements, convert it to factor with levels specified as the unique elements in the list, get the frequency with table, and transpose (t) the output.

    Un1 <- unique(unlist(df))
    t(sapply(df, function(x) table(
                  if(length(x)==0)
                     factor(x,levels = Un1) 
                  else factor(unlist(x), levels=Un1))))
    #     vector1 vector2 vector3
    #[1,]       0       0       0
    #[2,]       1       1       0
    #[3,]       0       0       0
    #[4,]       0       0       0
    #[5,]       0       0       0
    #[6,]       1       1       1
    

    data

    df <- list(list(),  list("vector1", "vector2"), list(), 
          list(), list(), list("vector1", "vector2", "vector3") )