Search code examples
rbioinformaticsbioconductorlimma

Issues with limma for analysis of microarray gene expression data - possibly related to design matrix


I am fairly new to R, and have recently started using it to analyse some microarray data. The overall aim of the analysis is to take DC2 and compare the WT vs KO groups in this population. But I have come across some problems with the limma processing. After processing the data using the oligo package, I have then tried to create a design matrix for analysis using limma. This is my workflow for the ExpressionSet of DC2:

pData(DC2)

         index filename genotype cell_type
1 KO DC2     2 HP10.CEL       KO       DC2
2 KO DC2     3 HP11.CEL       KO       DC2
3 KO DC2     4 HP12.CEL       KO       DC2
1 WT DC2    10  HP7.CEL       WT       DC2
2 WT DC2    11  HP8.CEL       WT       DC2
3 WT DC2    12  HP9.CEL       WT       DC2    

design <- model.matrix(~DC2$genotype)
design

  (Intercept) DC2$genotypeWT
1           1              0
2           1              0
3           1              0
4           1              1
5           1              1
6           1              1


fit <- lmFit(DC2, design)
fit <- eBayes(fit)
toptable(fit)

This feeds out a list of genes as follows:

            logFC        t      P.Value    adj.P.Val        B
17551163 14.09722 208.2627 2.990326e-13 2.700912e-10 17.14467
17511316 13.91167 205.0811 3.292503e-13 2.700912e-10 17.12716
17551167 13.92093 204.5801 3.343243e-13 2.700912e-10 17.12434
17375373 13.76320 202.1271 3.605170e-13 2.700912e-10 17.11025
17550685 13.74022 201.5428 3.671032e-13 2.700912e-10 17.10682

However, when I go to check this manually (just taking the first feature) using this code:

toptable(fit, n=1)
genename <- rownames(toptable(fit, n=1))
typeMean <- tapply(exprs(DC2)[genename,], DC2$genotype, mean)
typeMean["KO"] - typeMean["WT"]

The output for the same feature "17551163" is different

     KO 
0.04538317 

I have tried to search around for an answer, but with no luck. I'm assuming that this may be something to do with the matrix design? Any help would be greatly appreciated.

Thanks


Solution

  • An answer, for those who skip reading the discussion in comments below the question.

    After performing the estimation with lmFit and eBayes we can question the top differentiating genes between all the contrasts that we provided in the model.matrix step.

    Here, the author created the design as follows: design <- model.matrix(~DC2$genotype). Keeping in mind that the (Intercept) is the first coefficient if we want need to explicitly say that we want the contrast related to the DC2$genotype, so the call should be:

    toptable(fit, coef = 2)
    

    Naturally, if the design contains more contrasts they are assigned consecutive natural numbers.

    REMARK

    If we want to remove the intercept from the design design <- model.matrix(~ -1 + DC2$genotype); the first coefficient is now DC2$genotype.