I am performing an RNA-seq analysis and I need a logical vector, however I am starting from a SimpleLogicalList called hk
with 58037 elements, that I obtained from hk <- features.info$symbol %in% house_keeping_genes
where features.info
is a dataframe and house_keeping_genes
is a vector.
After using unlist(hk)
58731 elements are retrieved. Then I realized that there were parts of the list that contained more than two elements (in stead of just FALSE
, it contained FALSE
FALSE
FALSE
, thus increasing the length of the result.
Then I just used a unlist(unique(hk))
and most of the unexpected variables were dropped, however there were still 58041 elements in stead of 58037 and I have no idea where are they coming from. I checked and there are no NA
being generated.
What could I do to find where those 4 extra elements are coming from?
> dput(hk[60:70])
new("SimpleLogicalList", elementType = "logical", elementMetadata = NULL,
metadata = list(), listData = list(ENSG00000004777 = FALSE,
ENSG00000004779 = FALSE, ENSG00000004799 = FALSE, ENSG00000004809 = FALSE,
ENSG00000004838 = FALSE, ENSG00000004846 = FALSE, ENSG00000004848 = FALSE,
ENSG00000004864 = FALSE, ENSG00000004866 = c(FALSE, FALSE,
FALSE), ENSG00000004897 = TRUE, ENSG00000004939 = FALSE))
> dput(features.info$symbol[1:5])
new("SimpleCharacterList", elementType = "character", elementMetadata = NULL,
metadata = list(), listData = list(ENSG00000000003 = "TSPAN6",
ENSG00000000005 = "TNMD", ENSG00000000419 = "DPM1", ENSG00000000457 = "SCYL3",
ENSG00000000460 = "C1orf112"))
> dput(house_keeping_genes[1:5])
c("DPM1", "SCYL3", "GCLC", "BAD", "LAP3")
Edit: I need the logical vector to use it as an argument for RUVg()
function and if I write hk an error is retrieved: > Error in Ycenter[, cIdx] : invalid subscript type 'S4'
.
Packages:
other attached packages:
[1] NCmisc_1.1.6 RUVSeq_1.28.0 EDASeq_2.28.0 ShortRead_1.52.0
[5] GenomicAlignments_1.30.0 Rsamtools_2.10.0 Biostrings_2.62.0 XVector_0.34.0
[9] snpStats_1.44.0 Matrix_1.4-0 survival_3.2-13 sva_3.42.0
[13] BiocParallel_1.28.3 genefilter_1.76.0 mgcv_1.8-38 nlme_3.1-153
[17] pheatmap_1.0.12 ggfortify_0.4.14 ggplot2_3.3.5 edgeR_3.36.0
[21] limma_3.50.0 dplyr_1.0.7 SummarizedExperiment_1.24.0 GenomicRanges_1.46.1
[25] GenomeInfoDb_1.30.0 IRanges_2.28.0 S4Vectors_0.32.3 MatrixGenerics_1.6.0
[29] matrixStats_0.61.0 tweeDEseqCountData_1.32.0 Biobase_2.54.0 BiocGenerics_0.40.0
The issue would be that some of the symbol
(which is a SimpleLogicalList
) contains more than one element, so we loop over the list
with sapply
, and wrap with any
which returns a single TRUE/FALSE if any of the elements in the list
element are present %in%
'house_keeping_genes. The \(x)
is a concise way to represent lambda function (function(x)
) in the recent versions of R
hk1 <- sapply(features.info$symbol, \(x)
any(x %in% house_keeping_genes, na.rm = TRUE))