Search code examples
rstatisticsdata-analysis

Using cor.test() on columns with na's


When I'm trying to calculate the spearman correlations of one column on another dataframe, everytime I use cor.test(dfx, dfy, method = c("spearman"), na.action = "na.exclude") it throws an error saying that x and y must have the same length. I have used cor() to do the same thing while specifying "complete.obs" which worked perfectly fine but I need to have the P values for each obs as well.

Age Male (1,0) Female (1,0) Other (race) Caucasian (1,0)
1  83          1            0            0               1
2  91          1            0            0               1
3  87          1            0            0               1
4  89          0            1            0               1
5  78          1            0            0               1
6  84          0            1            0               1

here's a sample of the 352x52 table. I'm comparing it to a column of 352x1 and there are a few NA's throughout, so I'm trying to figure out how to handle this and get the P value's reported.


Solution

  • Maybe try this, first I make an example dfx and dfy with some missing values:

    set.seed(100)
    M = matrix(rnorm(352*53),ncol=53)
    #make some NAs
    M[sample(length(M),500)] = NA
    dfy = M[,1]
    dfx = M[,-1]
    

    You use apply to iterate through the columns, and you take only those that are not NAs in that column and y:

    res = apply(dfx,2,function(i){
       compl = !is.na(i) & !is.na(dfy)
       unlist(cor.test(i[compl],dfy[compl],method="spearman")[c("estimate","p.value")])
    })
    
    res = t(res)
    
    head(res)
         estimate.rho   p.value
    [1,]  -0.03147103 0.5675366
    [2,]  -0.06137428 0.2596360
    [3,]  -0.06224493 0.2536336
    [4,]  -0.02586685 0.6354243
    [5,]   0.06105610 0.2642532