Search code examples
rmatrixcorrelationzero

Correlating non-zero values in a set of columns to one final column in R


I have a 300 row, 2010 column matrix of data where all but the last column contains a mixture of 0 values, and non-zero count data. The final column contains measurements for a variable I want to correlate the other columns to. I would like to get the correlation specifically to the non-zero values of the first 2009 columns alone (I have already done a correlation including zero values and want to compare the results). Is there a way I can modify the following code to give me a correlation based on ONLY nonzero values from each column?

> nrow(cor5.mat)
[1] 300

> ncol(cor5.mat)
[1] 2010

#last column is named "Smoking"

out5 <- as.data.frame(cor(cor5.mat, cor5.mat$Smoking))   

Solution

  • cor5.mat_1 <- cor5.mat
    cor5.mat_1[cor5.mat_1==0] <- NA
    
    cor(cor5.mat_1[,-1],use="pairwise.complete.obs")