Trying to calculate correlation of a specific gene (Here its Gene 1) with all others (35999 candidates) in a matrix
samp1 samp2 samp3 samp4
Gene 1 3.7891 2.4487 1.1939 0.6013
Gene 2 1.4484 3.2316 2.841 1.9545
Gene 3 0.4505 2.6062 2.0729 0.6403
.
.
Gene 36000 1.8828 5.2633 2.7552 1.7335
I used the following code
library(Hmisc)
A <- read.table("C:/Users/Desktop/exp.txt", header=T, sep="\t")
cor <- rcorr(as.matrix(A), type="pearson")
write.csv(cor$r,'C:/Users/Desktop/pCC VALUES.csv')
write.csv(cor$P,'C:/Users/Desktop/p VALUES.csv')
But the above code is for matrix calculation of 36000*36000. However, I would like to get one to many correlation where the gene of interest is always the first gene. Here its gene 1. This would save the processing time. One method is obviously extract the 36000 pairs of my interest from the output. I would like to know if any other method is possible to find correlation of my gene against all others without many to many calculation.
Edit:
I am looking for output like in the format,
Gene 1 Gene 2 pcc p-value
Gene 1 Gene 3 pcc p-value
.
.
Gene 1 Gene 36000 pcc p-value
end
If I get you right (correlating first row with every other single row, taken one at a time), something along those lines might get you started:
dat <- as.matrix(read.table(text = "samp1;samp2;samp3;samp4
Gene 1;3.7891;2.4487;1.1939;0.6013
Gene 2;1.4484;3.2316;2.841;1.9545
Gene 3;0.4505;2.6062;2.0729;0.6403
Gene 4;0.4705;2.4062;1.0729;0.6003
Gene 5;1.8828;5.2633;2.7552;1.7335", sep=";"))
corr_list <- list()
for (i in 2:nrow(dat)) {
r <- cor.test(dat[1,], dat[i,])
corr_list[[paste("Genes 1 &", i)]] <- c(r$estimate, p.val=r$p.value)
}
# Results
corr_list
$`Genes 1 & 2`
cor p.val
-0.3070573 0.6929427
$`Genes 1 & 3`
cor p.val
-0.1417635 0.8582365
$`Genes 1 & 4`
cor p.val
0.04777015 0.95222985
$`Genes 1 & 5`
cor p.val
0.1425788 0.8574212
You can also put results in a data.frame if more convenient:
corr_list <- data.frame(Gene1=numeric(), Gene2=numeric(), cor=numeric(), p.value=numeric())
for (i in 2:nrow(dat)) {
r <- cor.test(dat[1,], dat[i,])
corr_list[i-1,] <- c(1, i, r$estimate, r$p.value)
}
corr_list
Gene1 Gene2 cor p.value
1 1 2 -0.30705735 0.6929427
2 1 3 -0.14176355 0.8582365
3 1 4 0.04777015 0.9522299
4 1 5 0.14257884 0.8574212