Search code examples
rmatrixcorrelationpearson-correlation

One to many correlation calculation in a matrix in R


Trying to calculate correlation of a specific gene (Here its Gene 1) with all others (35999 candidates) in a matrix

         samp1   samp2   samp3   samp4
Gene 1  3.7891  2.4487  1.1939  0.6013
Gene 2  1.4484  3.2316  2.841   1.9545
Gene 3  0.4505  2.6062  2.0729  0.6403
.
.
Gene 36000  1.8828  5.2633  2.7552  1.7335

I used the following code

library(Hmisc)
A <- read.table("C:/Users/Desktop/exp.txt", header=T, sep="\t")
cor <- rcorr(as.matrix(A), type="pearson")


write.csv(cor$r,'C:/Users/Desktop/pCC VALUES.csv')
write.csv(cor$P,'C:/Users/Desktop/p VALUES.csv')

But the above code is for matrix calculation of 36000*36000. However, I would like to get one to many correlation where the gene of interest is always the first gene. Here its gene 1. This would save the processing time. One method is obviously extract the 36000 pairs of my interest from the output. I would like to know if any other method is possible to find correlation of my gene against all others without many to many calculation.

Edit:

I am looking for output like in the format,

Gene 1 Gene 2   pcc  p-value
Gene 1 Gene 3   pcc  p-value
.
.
Gene 1 Gene 36000 pcc p-value
end

Solution

  • If I get you right (correlating first row with every other single row, taken one at a time), something along those lines might get you started:

    dat <- as.matrix(read.table(text = "samp1;samp2;samp3;samp4
    Gene 1;3.7891;2.4487;1.1939;0.6013
    Gene 2;1.4484;3.2316;2.841;1.9545
    Gene 3;0.4505;2.6062;2.0729;0.6403
    Gene 4;0.4705;2.4062;1.0729;0.6003
    Gene 5;1.8828;5.2633;2.7552;1.7335", sep=";"))
    
    corr_list <- list()
    
    for (i in 2:nrow(dat)) {
      r <- cor.test(dat[1,], dat[i,])
      corr_list[[paste("Genes 1 &", i)]] <- c(r$estimate, p.val=r$p.value)
    }
    
    
    # Results
    corr_list
    
    $`Genes 1 & 2`
           cor      p.val 
    -0.3070573  0.6929427 
    
    $`Genes 1 & 3`
           cor      p.val 
    -0.1417635  0.8582365 
    
    $`Genes 1 & 4`
           cor      p.val 
    0.04777015 0.95222985 
    
    $`Genes 1 & 5`
          cor     p.val 
    0.1425788 0.8574212 
    

    You can also put results in a data.frame if more convenient:

    corr_list <- data.frame(Gene1=numeric(), Gene2=numeric(), cor=numeric(), p.value=numeric())
    
    for (i in 2:nrow(dat)) {
      r <- cor.test(dat[1,], dat[i,])
      corr_list[i-1,] <- c(1, i, r$estimate, r$p.value)
    }
    
    corr_list
    
      Gene1 Gene2         cor   p.value
    1     1     2 -0.30705735 0.6929427
    2     1     3 -0.14176355 0.8582365
    3     1     4  0.04777015 0.9522299
    4     1     5  0.14257884 0.8574212