Questions about DESeq2 and setting up logical vectors to select a result file

I am very new to use DESeq2 and R language for processing sequencing data. I have R and R-studio installed successfully, together with all necessary packages. But when I followed some tutorial to understand the design and contrast, I found myself lost. Here is the link to the tutorial: https://www.r-bloggers.com/2024/05/a-guide-to-designs-and-contrasts-in-deseq2/

My specific question is -- following the instruction, I got a result file called res.

res <- results(dds1, contrast = list("conditiontreatment", "conditioncontrol"))

Here I get res with a dimension of (1000,6)

I want to select the genes that have a p-value < 0.05 in the result file (res), so I set up the following logical vector: significant_p <- res$pvalue < 0.05 when I tried to use this logical vector to select the res file, it said that there are na values in res, so I dropped all the na values by doing the following: filtered_res <- na.omit(res)

Now I can use the logical vector on the filtered_res to select for the genes' expression with a pvalue < 0.05. However, the system returned the following error:

filtered_res[significant_p]
Error: subscript is a logical vector with out-of-bounds TRUE values

But when I used the logical vector with dds1, then it worked.

dds1[significant_p]
class: DESeqDataSet 
dim: 465 6 
metadata(1): version
assays(4): counts mu H cooks
rownames(465): gene1 gene2 ... gene996 gene997
rowData names(25): trueIntercept trueBeta ... deviance maxCooks
colnames(6): sample1 sample2 ... sample5 sample6
colData names(2): condition sizeFactor

I don't quite understand the datastructure here very well. Obviously I hope that the logical vector would work on the res file, instead of the original file dds1. Can anyone help explain a little to me?

Thank you very much!

Solution

One potential issue is:

significant_p <- res$pvalue < 0.05

This is a logical evaluation, 'is the pvalue < 0.05?', and it will return TRUE / FALSE / NA for each row.

Instead, try:

significant_p <- res[res$pvalue < 0.05 & !is.na(res$pvalue),]

E.g. with a minimal reproducible example:

res <- data.frame(pvalue = c(0.01, 0.02, NA, 0.07),
                  values = c(4, 5, 6, 7),
                  gene = c("gene_A", "gene_B", "gene_C", "gene_D"))

res
#>   pvalue values   gene
#> 1   0.01      4 gene_A
#> 2   0.02      5 gene_B
#> 3     NA      6 gene_C
#> 4   0.07      7 gene_D

# result for your existing code
res$pvalue < 0.05
#> [1]  TRUE  TRUE    NA FALSE

# a different approach to select all rows that are < 0.05 and not NA
res[res$pvalue < 0.05 & !is.na(res$pvalue),]
#>   pvalue values   gene
#> 1   0.01      4 gene_A
#> 2   0.02      5 gene_B

# or, if you only want the p-values that are < 0.05
res[res$pvalue < 0.05 & !is.na(res$pvalue),]$pvalue
#> [1] 0.01 0.02

# or, if you only want the gene names for rows that have p-value < 0.05
res[res$pvalue < 0.05 & !is.na(res$pvalue),]$gene
#> [1] "gene_A" "gene_B"

^{Created on 2024-09-19 with reprex v2.1.0}

Does that help?