Search code examples
rregexr-s4seurat

How to filter genes from seuratobject in slotname @data?


I am working with a R package called "Seurat" for single cell RNA-Seq analysis and I am trying to remove few genes in seuratobject (s4 class) from slot name 'data'. There are several slots in this object as well that stores information associated to the slot 'data'. The slot 'data' has Gene names in rows and cell IDs in columns with expression values of Genes corresponding each cell in the matrix. I want to remove entire row based on unique gene names but retain the outcome in the object.

Example:

       Cell1 Cell2 Cell3  
GeneA2    5    9    2     
GeneA     3    1    0  
GeneA1    2    1    3  

I want to remove row GeneA in the matrix.

I tried following but get errors:-

object<-SubsetRow(object@data, "GeneA", invert = TRUE)

and

GeneA<-grep(pattern = "^GeneA$", x = rownames(x = object@data), value = TRUE)
object@data<- object@data[!GeneA,]

Solution

  • Let's assume that you are working with something similar to the pbmc_small data object that os loaded with the Seurat package. The look at the example on the ?SubsetRow help page:

    # Installing the package:Seurat does install quite a few additonal packages
    library(Seurat)
    cd_genes <- SubsetRow(data = [email protected], code = 'CD')
    str(cd_genes)
    #=================
    Formal class 'dgTMatrix' [package "Matrix"] with 6 slots
      ..@ i       : int [1:209] 0 5 6 9 5 8 9 10 15 5 ...
      ..@ j       : int [1:209] 0 0 0 0 1 1 1 1 1 2 ...
      ..@ Dim     : int [1:2] 16 80
      ..@ Dimnames:List of 2
      .. ..$ : chr [1:16] "CD79B" "CD79A" "CD19" "CD180" ...
      .. ..$ : chr [1:80] "ATGCCAGAACGACT" "CATGGCCTGTGCAT" "GAACCTGATGAACC" "TGACTGGATTCTCA" ...
      ..@ x       : num [1:209] 1 4 1 2 4 2 2 1 1 4 ...
      ..@ factors : list()
     #===========
    rownames(x = cd_genes@data)
    

    Error in rownames(x = cd_genes@data) : no slot of name "data" for this object of class "dgTMatrix"

    So there is no @data slot in that object

    Instead just use rownames on cd_genes:

    rownames(x = cd_genes)
     [1] "CD79B"   "CD79A"   "CD19"    "CD180"   "CD200"   "CD3D"    "CD2"     "CCDC104" "CD3E"   
    [10] "CD7"     "CD8A"    "CD14"    "CD1C"    "CD68"    "CD9"     "CD247"  
    

    So this removes the name "CD200" from that object:

    > object<-SubsetRow([email protected], code="^CD200$", invert = TRUE)
    > str(object)
    Formal class 'dgTMatrix' [package "Matrix"] with 6 slots
      ..@ i       : int [1:4453] 1 5 8 11 22 29 32 33 35 37 ...
      ..@ j       : int [1:4453] 0 0 0 0 0 0 0 0 0 0 ...
      ..@ Dim     : int [1:2] 229 80
      ..@ Dimnames:List of 2
      .. ..$ : chr [1:229] "MS4A1" "CD79B" "CD79A" "HLA-DRA" ...
      .. ..$ : chr [1:80] "ATGCCAGAACGACT" "CATGGCCTGTGCAT" "GAACCTGATGAACC" "TGACTGGATTCTCA" ...
      ..@ x       : num [1:4453] 1 1 3 1 1 4 1 5 1 1 ...
      ..@ factors : list()
    > "CD200" %in% rownames(object)
    [1] FALSE
    > "CD200" %in% rownames([email protected])
    [1] TRUE
    

    There is a data-named slot in Seurat-objects but once you have extracted it, there is no longer a data-slot in that object:

    slotNames(pbmc_small)
     [1] "raw.data"     "data"         "scale.data"   "var.genes"    "is.expr"      "ident"       
     [7] "meta.data"    "project.name" "dr"           "assay"        "hvg.info"     "imputed"     
    [13] "cell.names"   "cluster.tree" "snn"          "calc.params"  "kmeans"       "spatial"     
    [19] "misc"         "version"  
    
     slotNames(pbmc_small@data)
    [1] "i"        "p"        "Dim"      "Dimnames" "x"        "factors" 
    

    Based on the comment it appears communication of the issue is not complete. If the question is how to modify an existing slot value then just use @<- as exemplified here:

    pbmc_small2 <- pbmc_small
    pbmc_small2@data <- SubsetRow(data = pbmc_small@data, code = 'CD')
    

    I'm not sure it's safe, however. The dimensions of the @data slot are now different than the dimensions of the @raw.data slot and other features may not match, although I don't know enough about that structure to be sure. The safe way to use S4 objects is to rely on the functions provided by package authors rather than mucking with low-level stuff like slots. Obviously, they wanted you to be able to subset from sparse matrices, but whether they wanted you to assign them back to slots is not so clear.