Search code examples
rbioinformaticsbioconductorrna-seq

DESeq2 code help for analysing differential gene expression between disease and control patients in R


I am having trouble devising some correct code to work out using DESeq2 in R, how to look at which genes are more highly expressed in the disease patients compared to the control.

My data is currently a large data frame consisting of 600000 gene names which are rows. The first three columns [1:3] are control patients and the last three are from pancreatic cancer patients [4:6].

DF = data.frame(id=colnames(genetable),type=rep(c("treat","ctrl"),each=3))
# run DESeq2
dds = DESeqDataSetFromMatrix(genetable,DF,~type)
ddsoutput <- DESeq(dds)
summary(results(dds),alpha=0.05)

To get the number of genes which are below 0.05;

res <- results(ddsoutput)
res[which(res$padj < 0.05),]

I then need to work out fold change between disease vs control, both upregulated and downregulated. I need to set my own fold change threshold, and then analyse the function of the most upregulated or downregulated genes based on my p values via KEGG;

 upreg <- subset(res, log2FoldChange > 1)

 downreg <- subset(res, log2FoldChange < 1)

But none of the above is working, I think I am going wrong by using type=rep maybe? I ultimately need to save the p value and log fold change value into a table.


Solution

  • there's a bug in your code, I corrected it below, and placed the previous it under ##. You replace the original DESeq2 object with the output of the function DESeq()

    DF = data.frame(id=colnames(genetable),type=rep(c("treat","ctrl"),each=3))
    # run DESeq2
    dds = DESeqDataSetFromMatrix(genetable,DF,~type)
    dds <- DESeq(dds)
    # it was
    # ddsoutput <- DESeq(dds)
    summary(results(dds),alpha=0.05)
    

    Now run the rest of the script and it should be fine