Search code examples
rdplyrsparse-matrixpurrr

how to cast a tibble to a sparse matrix


Consider this simple tibble

> data_frame(col1 = c(1,2,3), col2 = c(3,2,NA))
# A tibble: 3 x 2
   col1  col2
  <dbl> <dbl>
1     1     3
2     2     2
3     3    NA

What is the most efficient way to cast it as a sparse matrix? I tried something like

> data_frame(col1 = c(1,2,3), col2 = c(3,2,NA)) %>% 
+   as(., 'sparseMatrix')
Error in as(from, "CsparseMatrix") : 
  no method or default for coercing “tbl_df” to “CsparseMatrix”

with no success. Trying as suggested:

y <- purrr::reduce(cbind2, map(df, 'Matrix', sparse = TRUE))

does not work either.

Any good ideas using the tidyverse? Thanks!


Solution

  • This is just a translation of the bounty-awarded answer to the post linked above, from base lapply/Reduce to purrr's map/reduce. The previous answer used:

    Reduce(cbind2, lapply(x[,-1], Matrix, sparse = TRUE))
    

    Part of how this works is that data frames are technically lists, so you can use map to iterate over the columns of the data frame. This yields two sparse matrices, one for each column:

    library(dplyr)
    library(purrr)
    
    df <- data_frame(col1 = c(1,2,3), col2 = c(3,2,NA))
    
    map(df, Matrix::Matrix, sparse = T)
    #> $col1
    #> 3 x 1 sparse Matrix of class "dgCMatrix"
    #>       
    #> [1,] 1
    #> [2,] 2
    #> [3,] 3
    #> 
    #> $col2
    #> 3 x 1 sparse Matrix of class "dgCMatrix"
    #>        
    #> [1,]  3
    #> [2,]  2
    #> [3,] NA
    

    If you then reduce it with cbind2, that gets you a single sparse matrix.

    map(df, Matrix::Matrix, sparse = T) %>% 
      reduce(cbind2)
    #> 3 x 2 sparse Matrix of class "dgCMatrix"
    #>          
    #> [1,] 1  3
    #> [2,] 2  2
    #> [3,] 3 NA
    

    Created on 2018-10-16 by the reprex package (v0.2.1)