Search code examples
rsparse-matrixbioinformatics

Removing rows from a sparse matrix of class "dgCMatrix" in R


I am a bioinformatics student who is fairly new to R. I am working with a sparse matrix of class "dgCMatrix" and I'd like to find a way to conditionally remove rows in that matrix. Click here to see size and appearance of my sparse matrix called sobj.data

What I'd like to do is iterate over the sparse matrix and delete any row where the row name begins with "LOC107", such as the ninth and eleventh rows in the picture above. I understand how to set up a for loop to do the iteration, but what I can't figure out is how to remove the row itself.

This is what I've got so far:

Does anybody know how to remove the row in R using this for loop I have written (i.e. fill in the if statement)?


Solution

  • Well, here is a way to do it with non-sparse matrices in base R, perhaps the dgCmatrix class has similar approaches: Assume that the matrix is named X.

    keep_rows <- Negate(grepl(rownames(X), pattern="^LOC107"))
    Xnew <- X[keep_rows,]
    

    NOTE: In general, if there might be only a single row left after this process, you'd use Xnew <- X[keep_rows,,drop=FALSE] to avoid simplification.