Search code examples
roverlapvenn-diagram

Overlapping two gene sets ,finding their overlap significance and plotting them


(Fig. 3a, b, Extended Data Fig. 3a, b and Supplementary Table 1). After 48 h, more than one-third of the transcriptome was differentially expressed (>5,000 genes; 405 genes encoding for proteins in the extracellular region, Gene Ontology (GO) accession 0005576), significantly overlapping with the gene expression changes of A375 tumours in vivo after 5 days of vemurafenib treatment (Fig. 3a, b and Extended Data Fig. 3c). Similar extensive gene expression changes were observed in Colo800 and UACC62 melanoma cells treated with vemurafenib and H3122 lung adenocarcinoma cells treated with crizotinib (Extended Data Fig. 3d). Despite different cell lineages, different oncogenic drivers, and different targeted therapies we observed a significant overlap between the secretome of melanoma and lung adenocarcinoma cells (P < 9.11 × 10−5)

The original paper

figure

I would like to see similar to the figure f where it shows the intersection and significance overlap. To achieve that i got this code working till the intersection part but I dont know how to run the significance part.

library(reshape2)
library(venneuler)
RNA_seq_cds <- read.csv("~/Downloads/RNA_seq_gene_set.txt", header=TRUE, sep="\t")
head(RNA_seq_cds)
ATAC_seq <- read.csv("~/Downloads/ATAC_seq_gene_set.txt", header=TRUE, sep="\t")
head(ATAC_seq)
RNA_seq <- RNA_seq_cds
ATAC_seq <- ATAC_seq

#https://stackoverflow.com/questions/6988184/combining-two-data-frames-of-different-lengths
cbindPad <- function(...) {
  args <- list(...)
  n <- sapply(args, nrow)
  mx <- max(n)
  pad <- function(x, mx) {
    if (nrow(x) < mx) {
      nms <- colnames(x)
      padTemp <- matrix(NA, mx - nrow(x), ncol(x))
      colnames(padTemp) <- nms
      if (ncol(x) == 0) {
        return(padTemp)
      } else {
        return(rbind(x, padTemp))
      }
    } else {
      return(x)
    }
  }
  rs <- lapply(args, pad, mx)
  return(do.call(cbind, rs))
}

dat <- cbindPad(ATAC_seq, RNA_seq)

vennfun <- function(x) { 
  x$id <- seq(1, nrow(x))  #add a column of numbers (required for melt)
  xm <- melt(x, id.vars="id", na.rm=TRUE)  #melt table into two columns (value & variable)
  xc <- dcast(xm, value~variable, fun.aggregate=length)  #remove NA's, list presence/absence of each value for each variable (1 or 0)
  rownames(xc) <- xc$value  #value column=rownames (required for Venneuler)
  xc$value <- NULL  #remove redundent value column
  xc  #output the new dataframe
}

#https://stackoverflow.com/questions/9121956/legend-venn-diagram-in-venneuler
VennDat <- vennfun(dat)
genes.venn <- venneuler(VennDat)
genes.venn$labels <- c("RNA", "\nATAC" )
# plot(genes.venn, cex =15, )
#https://stackoverflow.com/questions/30225151/how-to-create-venn-diagram-in-r-studio-from-group-of-three-frequency-column
#https://rstudio-pubs-static.s3.amazonaws.com/13301_6641d73cfac741a59c0a851feb99e98b.html   
vd <- venneuler(VennDat)
vd$labels <- paste(genes.venn$labels, colSums(VennDat))
plot(vd, cex=10)
text(.3, .45, 
     bquote(bold("Common ="~.(as.character(sum(rowSums(VennDat) == 2))))), 
     col="red", cex=1)

LABS <- vd$labels

The above code gives me the intersection plot intersection_plot

Now the significance part how do i do that between two gene sets and show it as shown in the original figure.

My data which i have used to generate the above plot

Any suggestion or help would be really appreciated.


Solution

  • If you talk about how to place any text under your figure, just use 'text' as you did before. It's just some guessing on which x= and y= coordinates. Thexpd=TRUE allows you to plot over the margin.

    VennDat <- vennfun(dat)
    vd <- venneuler(VennDat)
    vd$labels <- paste(c("RNA", "ATAC"), colSums(VennDat))
    
    plot(vd, cex=10, border=c(NA, 'red'), col=c('#6b65af', '#ad7261'))
    text(x=.5, y=.5, sum(rowSums(VennDat) == 2), xpd=TRUE)
    text(.5, .15, 'overlap\n', xpd=TRUE)
    text(.5, .13, bquote(italic(p)*'< 9.11E-55'), xpd=TRUE)
    

    enter image description here

    I also adjusted some parameters of plot. You may inspect the code of the plotting method using:

    venneuler:::plot.VennDiagram
    

    If you want to know how significance is calculated, you should post your question at Cross Validated.