Search code examples
rsparse-matrix

Counting number of elements that fall between two values in each column of sparse matrix


I have a sparse matrix, such as below

library(Matrix)

set.seed(2019)
nrows <- 10L
ncols <- 5L
vals <- sample(
  x = c(0,1,2,3),
  prob = c(0.7,0.1,0.1,0.1),
  size = nrows*ncols,
  replace = TRUE
)
mat <- matrix(vals,nrow=nrows)
matSparse <- as(mat,"sparseMatrix")

> matSparse
10 x 5 sparse Matrix of class "dgCMatrix"

 [1,] 2 2 . . .
 [2,] 2 . . . .
 [3,] . . 1 3 3
 [4,] . . . . .
 [5,] . . . . 3
 [6,] . . . . .
 [7,] 3 . . . 1
 [8,] . 2 1 . 1
 [9,] . . . . .
[10,] . . . 2 .

I'd like to compute for each column the number of elements that fall between certain values (may be different for each column). For example, I have a vector (of length ncols) brks = c(1, 2, 1, 2, 2). I would like to compute for each column j the following things:

1) The number of elements that are > 0(.) and <=brks[j] 2) The number of elements that are >brks[j].

In the above example, the result would be 1) 0 2 2 1 2 and 2) 3 0 0 1 2.

I've tried creating logical sparse matrices of class lgeMatrix and applying colSums, but have been unsuccessful. In the end I'd like to have an efficient way of doing this as I have very large matrices (10000 rows and 100000 columns)


Solution

  • What if you compared against a matrix of the same dimensions?

    cmpr <- t(brks)[rep(1,nrow(matSparse)),]
    
    colSums(matSparse > 0 & matSparse <= cmpr)
    #[1] 0 2 2 1 2
    
    colSums(matSparse > cmpr)
    #[1] 3 0 0 1 2
    

    Or even sweep:

    gt0ltB <- function(x,y) x > 0 & x <= y
    gtB    <- function(x,y) x > y
    
    colSums(sweep(matSparse, STATS=brks, MARGIN=2, FUN=gt0ltB))
    #[1] 0 2 2 1 2
    colSums(sweep(matSparse, STATS=brks, MARGIN=2, FUN=gtB))
    #[1] 3 0 0 1 2