I have a matrix x
(30x2000) of 2000 gene expressions in different cell lines and a vector y
(30x1) of a continuous variable outcome. I want to calculate Pearson correlation between each gene and the outcome, so, I expect a 2000x1 vector of r-values. I've used rcorr(x,y)
but the result is a 2000x2000 matrix, so I guess it's ignoring the y
and calculating all genes against all (the manual says:
x = a numeric matrix with at least 5 rows and at least 2 columns (if y is absent)
But can I have more than one column and have y
too? Do I have to use a different function?
You need to apply
the cor
function across the columns of your x
matrix...
apply( x , 2 , cor , y = y )
# For reproducible data
set.seed(1)
# 3 x 4 matrix
x <- matrix( runif(12) , nrow = 3 )
# [,1] [,2] [,3] [,4]
#[1,] 0.2655087 0.9082078 0.9446753 0.06178627
#[2,] 0.3721239 0.2016819 0.6607978 0.20597457
#[3,] 0.5728534 0.8983897 0.6291140 0.17655675
# Length 3 vector
y <- runif(3)
#[1] 0.6870228 0.3841037 0.7698414
# Length 4 otuput vector
apply( x , 2 , cor , y = y )
#[1] 0.3712437 0.9764443 0.2249998 -0.4903723