Search code examples
rbioconductorcontrast

makeContrast between two different sets of data


I need to find differentially expressed genes between 35 lines (in microarrays). 30 lines' names start with RAL and 5 lines' start with ZI. I want to make contrast between 30 RAL lines and 5 ZI lines. Since I don't want to type manually all 150, I wanted to use makeContrast.

My data is this:

dput(sampletype)

structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 
5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L, 9L, 9L, 9L, 10L, 
10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 13L, 13L, 13L, 14L, 14L, 
14L, 15L, 15L, 15L, 16L, 16L, 16L, 17L, 17L, 17L, 18L, 18L, 18L, 
19L, 19L, 19L, 20L, 20L, 20L, 21L, 21L, 21L, 22L, 22L, 22L, 23L, 
23L, 23L, 24L, 24L, 24L, 25L, 25L, 25L, 26L, 26L, 26L, 27L, 27L, 
27L, 28L, 28L, 28L, 29L, 29L, 29L, 30L, 30L, 30L, 31L, 31L, 32L, 
32L, 32L, 33L, 33L, 33L, 34L, 34L, 34L, 35L, 35L, 35L), .Label = c("RAL307", 
"RAL820", "RAL705", "RAL765", "RAL852", "RAL799", "RAL301", "RAL427", 
"RAL437", "RAL315", "RAL357", "RAL304", "RAL391", "RAL313", "RAL486", 
"RAL380", "RAL859", "RAL786", "RAL399", "RAL358", "RAL360", "RAL517", 
"RAL639", "RAL732", "RAL379", "RAL555", "RAL324", "RAL774", "RAL42", 
"RAL181", "ZI50N", "ZI186N", "ZI357N", "ZI31N", "ZI197N"), class = "factor")

design.matrix <- model.matrix(~ 0 + sample types)

How can I get the contrast such as "RAL517-ZI50", "RAL852-ZI50", "RAL517-ZI42", "RAL852-ZI42" ?

Is there anyway I can do this?

These are from my sessionInfo():

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] gplots_2.12.1      reshape2_1.2.2     ggplot2_0.9.3.1    affy_1.38.1        vsn_3.28.0         Biobase_2.20.1    
[7] BiocGenerics_0.6.0 limma_3.16.8      

loaded via a namespace (and not attached):
 [1] BiocInstaller_1.10.4  KernSmooth_2.23-10    MASS_7.3-29           RColorBrewer_1.0-5    affyio_1.28.0        
 [6] bitops_1.0-6          caTools_1.14          colorspace_1.2-4      dichromat_2.0-0       digest_0.6.3         
[11] gdata_2.13.2          grid_3.0.2            gtable_0.1.2          gtools_3.1.0          labeling_0.2         
[16] lattice_0.20-23       munsell_0.4.2         plyr_1.8              preprocessCore_1.22.0 proto_0.3-10         
[21] scales_0.2.3          stringr_0.6.2         tools_3.0.2           zlibbioc_1.6.0       

Thanks


Solution

  • as you have a problem of class comparison between two classes I suggest you to read the user guide of the limma package of Bioconductor, which is a popular package for identification of differentially expressed genes (http://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf). You can focus on section 9.2 if you are working with one-color microarrays.

    By the way, you have to create a two-level factor to perform the comparison:

    # build the design matrix
    
    library(limma)
    
    yourfactor <- c(rep("RAL", 30),rep("ZI", 5))
    design <- model.matrix(~ 0 + yourfactor)
    colnames(design) <- gsub("yourfactor", "", colnames(design)) # to simplify the colnames of design
    
    # perform the comparison
    
    
    fit <- lmFit(data, design)    # data is your gene expression matrix
    contrast.matrix <- makeContrasts(RAL-ZI, levels=design)
    fit2 <- contrasts.fit(fit, contrast.matrix)
    fit2 <- eBayes(fit2)
    
    # summarize the results of the linear model
    results <- topTable(fit2, number=nrow(data), adjust.method="BH")
    

    Be careful that samples in the expression matrix and sample labels in your factor are in the same order. To avoid this kind of problem I suggest you to create an ExpressionSet object (http://www.bioconductor.org/packages/release/bioc/html/Biobase.html) which is very useful for manipulating gene expression data.

    I hope this was helpful,

    Best.

    Matteo