I have a pretty big sparse matrix (40,000 x 100,000+) and I want to replace an element by 1 if it is greater than some threshold. However, each row in the matrix has a unique threshold value (this is just a vector that is the length of the rows) so I want to go row by row and check if the elements of a particular row is greater than the unique threshold value for that row.
I originally attempted this problem with a for loop by going through all the non-zero elements of the sparse matrix but this took way too long since I have over 100 million plus elements to go through.
number_of_elem <- matrix@x %>% length()
for (j in 1:number_of_elem){
threshold <- thres_array[j]
if (threshold == 0){
next
}
if (matrix@x[j] > threshold){
matrix@x[j] <- 1
}
}
I then began attempting to use the apply function but I was not able to exactly figure it out to work around the issue of skipping a threshold if it is zero. For reference, I first calculated the quantile of each row and I set my threshold to be above the 95th percentile. Since it is a sparse matrix some of the thresholds values are zeros.
Any ideas on how to approach this? From what I know in R it is highly preferred to vectorize the code and avoid for loops but I could not think of a sustainable method.
I modified @Bas solution so that it utilizes the sparsity of the matrix allowing to increase the performance.
mat@x[mat@x > thres_array[mat@i + 1] ] <- 1
mat@x
gives the non-zero elements of the sparse matrix and mat@i
gives what row that non-zero element belongs to (you have to add 1 since it is zero-indexed). Since the elements of thres_array
are based on the corresponding row you can make a logical vector from mat@x > thres_array[mat@i + 1]
and reassigns those values to 1.