Search code examples
rrandomsampleseurat

Odd behaviour of sample when subsetting Seurat objects


I am seeing a peculiar behaviour from the R Seurat package, when trying to subset objects to specific sets of cells.

So, say that I generate three sets of random cell names from a Seurat object using sample

library(Seurat)

set.seed(12345)

ten_cells_id <- sample(Cells(pbmc_small), 10)
other_ten_ids <- sample(Cells(pbmc_small), 10)
and_other_ten <- sample(Cells(pbmc_small), 10)

I can now subset the object using [] and print the cell tags

Cells(pbmc_small[, ten_cells_id], pt.size=3)
Cells(pbmc_small[, other_ten_ids], pt.size=3)
Cells(pbmc_small[, and_other_ten], pt.size=3)

No surprises here; it yields three different things as expected.

> Cells(pbmc_small[, ten_cells_id], pt.size=3)
 [1] "CATGAGACACGGGA" "CGTAGCCTGTATGC" "ACTCGCACGAAAGT" "CTAGGTGATGGTTG" "TTACGTACGTTCAG" "CATGGCCTGTGCAT"
 [7] "ACAGGTACTGGTGT" "AATGTTGACAGTCA" "GATAGAGAAGGGTG" "CATTACACCAACTG"
> Cells(pbmc_small[, other_ten_ids], pt.size=3)
 [1] "GGCATATGCTTATC" "ACAGGTACTGGTGT" "CATCAGGATGCACA" "ATGCCAGAACGACT" "GAGTTGTGGTAGCT" "GGCATATGGGGAGT"
 [7] "AGAGATGATCTCGC" "GAACCTGATGAACC" "GATATAACACGCAT" "CATGAGACACGGGA"
> Cells(pbmc_small[, and_other_ten], pt.size=3)
 [1] "GGGTAACTCTAGTG" "TTTAGCTGTACTCT" "TACATCACGCTAAC" "CTAAACCTGTGCAT" "ATACCACTCTAAGC" "CATGCGCTAGTCAC"
 [7] "GATAGAGAAGGGTG" "ATTACCTGCCTTAT" "GCGCATCTTGCTCC" "ACAGGTACTGGTGT"

However, if I do

cells1 <- pbmc_small[, sample(Cells(pbmc_small), 10)]
cells2 <- pbmc_small[, sample(Cells(pbmc_small), 10)]
cells3 <- pbmc_small[, sample(Cells(pbmc_small), 10)]

Cells(cells1)
Cells(cells2)
Cells(cells3)

I get three times the same thing

> Cells(cells1)
 [1] "GATAGAGATCACGA" "GGCATATGCTTATC" "ATGCCAGAACGACT" "AGATATACCCGTAA" "TACAATGATGCTAG" "CATGAGACACGGGA"
 [7] "GCACTAGACCTTTA" "CGTAGCCTGTATGC" "TTACCATGAATCGC" "ATAAGTTGGTACGT"
> Cells(cells2)
 [1] "GATAGAGATCACGA" "GGCATATGCTTATC" "ATGCCAGAACGACT" "AGATATACCCGTAA" "TACAATGATGCTAG" "CATGAGACACGGGA"
 [7] "GCACTAGACCTTTA" "CGTAGCCTGTATGC" "TTACCATGAATCGC" "ATAAGTTGGTACGT"
> Cells(cells3)
 [1] "GATAGAGATCACGA" "GGCATATGCTTATC" "ATGCCAGAACGACT" "AGATATACCCGTAA" "TACAATGATGCTAG" "CATGAGACACGGGA"
 [7] "GCACTAGACCTTTA" "CGTAGCCTGTATGC" "TTACCATGAATCGC" "ATAAGTTGGTACGT"

The values are always the same, independently of the seed I use! I guess that R is somehow resetting the seed each time. This is not an issue with [] as:

a <- 1:100
a[sample(1:100, 10)]
a[sample(1:100, 10)]
a[sample(1:100, 10)]

Returns three different values.

The only thing I can think of is that something strange is happening because Seurat overloads []. Any ideas?


Solution

  • It looks like this is because [.Seurat() calls subset.Seurat(), which in turn calls WhichCells(). WhichCells() has a seed argument, which defaults to 1. You can override this by setting it to NULL, and thankfully this will also filter through if you pass it to [ like so:

    library(Seurat)
    #> Attaching SeuratObject
    #> Attaching sp
    
    set.seed(12345)
    
    cells1 <- pbmc_small[, sample(Cells(pbmc_small), 10), seed = NULL]
    cells2 <- pbmc_small[, sample(Cells(pbmc_small), 10), seed = NULL]
    cells3 <- pbmc_small[, sample(Cells(pbmc_small), 10), seed = NULL]
    
    Cells(cells1)
    #>  [1] "GATAGAGATCACGA" "GGCATATGCTTATC" "ATGCCAGAACGACT" "AGATATACCCGTAA"
    #>  [5] "TACAATGATGCTAG" "CATGAGACACGGGA" "GCACTAGACCTTTA" "CGTAGCCTGTATGC"
    #>  [9] "TTACCATGAATCGC" "ATAAGTTGGTACGT"
    Cells(cells2)
    #>  [1] "GTCATACTTCGCCT" "TGGTATCTAAACAG" "ATCATCTGACACCA" "GTTGACGATATCGG"
    #>  [5] "GACGCTCTCTCTCG" "AGATATACCCGTAA" "CTTCATGACCGAAT" "CTAACGGAACCGAT"
    #>  [9] "TACTCTGAATCGAC" "GCGTAAACACGGTT"
    Cells(cells3)
    #>  [1] "GTCATACTTCGCCT" "GCTCCATGAGAAGT" "ACAGGTACTGGTGT" "TACATCACGCTAAC"
    #>  [5] "CCATCCGATTCGCC" "GACGCTCTCTCTCG" "CTTCATGACCGAAT" "GCGTAAACACGGTT"
    #>  [9] "CATTACACCAACTG" "CTTGATTGATCTTC"
    

    Created on 2022-10-17 with reprex v2.0.2

    In my opinion this is quite poorly documented, and the behaviour is confusing enough to possibly justify a new issue at the surat-object GitHub.