Search code examples
rcorrelation

R: Pearson correlation rcorr(x,y) [x=matrix, y=vector] ignores y


I have a matrix x (30x2000) of 2000 gene expressions in different cell lines and a vector y (30x1) of a continuous variable outcome. I want to calculate Pearson correlation between each gene and the outcome, so, I expect a 2000x1 vector of r-values. I've used rcorr(x,y) but the result is a 2000x2000 matrix, so I guess it's ignoring the y and calculating all genes against all (the manual says:

x = a numeric matrix with at least 5 rows and at least 2 columns (if y is absent)

But can I have more than one column and have y too? Do I have to use a different function?


Solution

  • You need to apply the cor function across the columns of your x matrix...

    apply( x , 2 , cor , y = y )
    

    A reproducible example

    #  For reproducible data
    set.seed(1)
    
    #  3 x 4 matrix
    x <- matrix( runif(12) , nrow = 3 )
    #          [,1]      [,2]      [,3]       [,4]
    #[1,] 0.2655087 0.9082078 0.9446753 0.06178627
    #[2,] 0.3721239 0.2016819 0.6607978 0.20597457
    #[3,] 0.5728534 0.8983897 0.6291140 0.17655675
    
    # Length 3 vector
    y <- runif(3)
    #[1] 0.6870228 0.3841037 0.7698414
    
    # Length 4 otuput vector
    apply( x , 2 , cor , y = y )
    #[1]  0.3712437  0.9764443  0.2249998 -0.4903723