Search code examples
rsparse-matrix

Optimized way to replace non zero values with zero in a sparse matrix based on another sparse matrix


I have two large sparse matrices (say A and B). I want to replace non zero elements to zero in A based on B matrix which contains ranks of all the individual elements ranked across every column. My output matrix should contain top n and bottom n ranked elements from A matrix and all the other non zero values should be made equal to zero.

Below is my approach. I am using loops in the function GetTopNBottomN, I am wondering if it could be optimized since it takes ages when the matrices get large.

#input matrix
TestMatrix = Matrix(c(0.80,0.9,0.6,0,0,0.3,0.5,
                  0,0,0.3,0,0,0,0,
                  0.4,0.5,0.6,0,0,0.1,0,
                  0,0,0,0,0,0,0,
                  0.3,0.4,0.5,0.2,0.1,0.7,0.8,
                  0.6,0.7,0.5,0.8,0,0,0),7,sparse = TRUE) 

#function to genrate ranks across all the columns for the input matrix
GenerateRankMatrix <- function(aMatrix){ ## Function Begins
  n <- diff(aMatrix@p)  ## number of non-zeros per column
  lst <- split(aMatrix@x, rep.int(1:ncol(aMatrix), n))  ## columns to list
  r <- unlist(lapply(lapply(lst,function(x) x * -1), rank))  ## column-wise   ranking and result collapsing
  RankMatrix <- aMatrix  ## copy sparse matrix
  RankMatrix@x <- r  ## replace non-zero elements with rank
  return(RankMatrix)
} # Function Ends

## Function to retain Top N and Bottom N records
GetTopNBottomN <- function(aMatrix,rMatrix){
  #aMatrix = original SparseMatrix, rMatrix = RankMatrix
  n = 2 ## Top 2 and Bottom 2 Elements across all columns
  for(j in 1:ncol(aMatrix)){ 
    MaxValue = max(rMatrix[,j])
    if(MaxValue <= 2 * n) next  ##Ignore the column if there are less than or equal to 2*n nonzero values
    for (i in 1: nrow(aMatrix)){
      if(rMatrix[i,j] >n & rMatrix[i,j] <= MaxValue-n){ #IF Cond
        aMatrix[i,j] = 0
      } #IF ends
    }

  }   
  return(aMatrix)
}

#Output
RankMatrix = GenerateRankMatrix(TestMatrix) #Genrate Rank Matrix
#Output Matrix
GetTopNBottomN(TestMatrix,RankMatrix)

Solution

  • I extracted the indexes of the non-zero elements, and used ave() to calculate the group-wise ranks

    idx <- which(TestMatrix != 0, arr.ind=TRUE)
    ranks = ave(-TestMatrix[idx], idx[,2], FUN=rank)
    

    or actually your desired result, the values to keep

    keep = ave(-TestMatrix[idx], idx[,2], FUN=function(elt) {
        elt = rank(elt)
        (elt > 2) & (elt <= length(elt) - 2)
    }) == 0
    idx = idx[keep,]
    

    Then create a new sparse matrix

    sparseMatrix(idx[,1], idx[,2], x=TestMatrix[idx])