I am seeing a peculiar behaviour from the R Seurat
package, when trying to subset objects to specific sets of cells.
So, say that I generate three sets of random cell names from a Seurat object using sample
library(Seurat)
set.seed(12345)
ten_cells_id <- sample(Cells(pbmc_small), 10)
other_ten_ids <- sample(Cells(pbmc_small), 10)
and_other_ten <- sample(Cells(pbmc_small), 10)
I can now subset the object using [] and print the cell tags
Cells(pbmc_small[, ten_cells_id], pt.size=3)
Cells(pbmc_small[, other_ten_ids], pt.size=3)
Cells(pbmc_small[, and_other_ten], pt.size=3)
No surprises here; it yields three different things as expected.
> Cells(pbmc_small[, ten_cells_id], pt.size=3)
[1] "CATGAGACACGGGA" "CGTAGCCTGTATGC" "ACTCGCACGAAAGT" "CTAGGTGATGGTTG" "TTACGTACGTTCAG" "CATGGCCTGTGCAT"
[7] "ACAGGTACTGGTGT" "AATGTTGACAGTCA" "GATAGAGAAGGGTG" "CATTACACCAACTG"
> Cells(pbmc_small[, other_ten_ids], pt.size=3)
[1] "GGCATATGCTTATC" "ACAGGTACTGGTGT" "CATCAGGATGCACA" "ATGCCAGAACGACT" "GAGTTGTGGTAGCT" "GGCATATGGGGAGT"
[7] "AGAGATGATCTCGC" "GAACCTGATGAACC" "GATATAACACGCAT" "CATGAGACACGGGA"
> Cells(pbmc_small[, and_other_ten], pt.size=3)
[1] "GGGTAACTCTAGTG" "TTTAGCTGTACTCT" "TACATCACGCTAAC" "CTAAACCTGTGCAT" "ATACCACTCTAAGC" "CATGCGCTAGTCAC"
[7] "GATAGAGAAGGGTG" "ATTACCTGCCTTAT" "GCGCATCTTGCTCC" "ACAGGTACTGGTGT"
However, if I do
cells1 <- pbmc_small[, sample(Cells(pbmc_small), 10)]
cells2 <- pbmc_small[, sample(Cells(pbmc_small), 10)]
cells3 <- pbmc_small[, sample(Cells(pbmc_small), 10)]
Cells(cells1)
Cells(cells2)
Cells(cells3)
I get three times the same thing
> Cells(cells1)
[1] "GATAGAGATCACGA" "GGCATATGCTTATC" "ATGCCAGAACGACT" "AGATATACCCGTAA" "TACAATGATGCTAG" "CATGAGACACGGGA"
[7] "GCACTAGACCTTTA" "CGTAGCCTGTATGC" "TTACCATGAATCGC" "ATAAGTTGGTACGT"
> Cells(cells2)
[1] "GATAGAGATCACGA" "GGCATATGCTTATC" "ATGCCAGAACGACT" "AGATATACCCGTAA" "TACAATGATGCTAG" "CATGAGACACGGGA"
[7] "GCACTAGACCTTTA" "CGTAGCCTGTATGC" "TTACCATGAATCGC" "ATAAGTTGGTACGT"
> Cells(cells3)
[1] "GATAGAGATCACGA" "GGCATATGCTTATC" "ATGCCAGAACGACT" "AGATATACCCGTAA" "TACAATGATGCTAG" "CATGAGACACGGGA"
[7] "GCACTAGACCTTTA" "CGTAGCCTGTATGC" "TTACCATGAATCGC" "ATAAGTTGGTACGT"
The values are always the same, independently of the seed I use!
I guess that R is somehow resetting the seed each time. This is not an issue with []
as:
a <- 1:100
a[sample(1:100, 10)]
a[sample(1:100, 10)]
a[sample(1:100, 10)]
Returns three different values.
The only thing I can think of is that something strange is happening because Seurat overloads []
. Any ideas?
It looks like this is because [.Seurat()
calls subset.Seurat()
, which in turn calls WhichCells()
. WhichCells()
has a seed
argument, which defaults to 1. You can override this by setting it to NULL
, and thankfully this will also filter through if you pass it to [
like so:
library(Seurat)
#> Attaching SeuratObject
#> Attaching sp
set.seed(12345)
cells1 <- pbmc_small[, sample(Cells(pbmc_small), 10), seed = NULL]
cells2 <- pbmc_small[, sample(Cells(pbmc_small), 10), seed = NULL]
cells3 <- pbmc_small[, sample(Cells(pbmc_small), 10), seed = NULL]
Cells(cells1)
#> [1] "GATAGAGATCACGA" "GGCATATGCTTATC" "ATGCCAGAACGACT" "AGATATACCCGTAA"
#> [5] "TACAATGATGCTAG" "CATGAGACACGGGA" "GCACTAGACCTTTA" "CGTAGCCTGTATGC"
#> [9] "TTACCATGAATCGC" "ATAAGTTGGTACGT"
Cells(cells2)
#> [1] "GTCATACTTCGCCT" "TGGTATCTAAACAG" "ATCATCTGACACCA" "GTTGACGATATCGG"
#> [5] "GACGCTCTCTCTCG" "AGATATACCCGTAA" "CTTCATGACCGAAT" "CTAACGGAACCGAT"
#> [9] "TACTCTGAATCGAC" "GCGTAAACACGGTT"
Cells(cells3)
#> [1] "GTCATACTTCGCCT" "GCTCCATGAGAAGT" "ACAGGTACTGGTGT" "TACATCACGCTAAC"
#> [5] "CCATCCGATTCGCC" "GACGCTCTCTCTCG" "CTTCATGACCGAAT" "GCGTAAACACGGTT"
#> [9] "CATTACACCAACTG" "CTTGATTGATCTTC"
Created on 2022-10-17 with reprex v2.0.2
In my opinion this is quite poorly documented, and the behaviour is confusing enough to possibly justify a new issue at the surat-object GitHub.