Search code examples
rcovariance

Covariance matrices by group, lots of NA


This is a follow up question to my earlier post (covariance matrix by group) regarding a large data set. I have 6 variables (HML, RML, FML, TML, HFD, and BIB) and I am trying to create group specific covariance matrices for them (based on variable Group). However, I have a lot of missing data in these 6 variables (not in Group) and I need to be able to use that data in the analysis - removing or omitting by row is not a good option for this research.

I narrowed the data set down into a matrix of the actual variables of interest with:

>MMatrix = MMatrix2[1:2187,4:10]

This worked fine for calculating a overall covariance matrix with:

>cov(MMatrix, use="pairwise.complete.obs",method="pearson")

So to get this to list the covariance matrices by group, I turned the original data matrix into a data frame (so I could use the $ indicator) with:

>CovDataM <- as.data.frame(MMatrix)

I then used the following suggested code to get covariances by group, but it keeps returning NULL:

>cov.list <- lapply(unique(CovDataM$group),function(x)cov(CovDataM[CovDataM$group==x,-1]))

I figured this was because of my NAs, so I tried adding use = "pairwise.complete.obs" as well as use = "na.or.complete" (when desperate) to the end of the code, and it only returned NULLs. I read somewhere that "pairwise.complete.obs" could only be used if method = "pearson" but the addition of that at the end it didn't make a difference either. I need to get covariance matrices of these variables by group, and with all the available data included, if possible, and I am way stuck.


Solution

  • Your problem is that lapply is treating your list oddly. If you run this code (which I hope is pretty much analogous to yours):

    CovData <- matrix(1:75, 15) 
    CovData[3,4] <- NA
    CovData[1,3] <- NA
    CovData[4,2] <- NA
    CovDataM <- data.frame(CovData, "group" = c(rep("a",5),rep("b",5),rep("c",5)))
    
    colnames(CovDataM) <- c("a","b","c","d","e", "group")
    lapply(unique(as.character(CovDataM$group)), function(x) print(x))
    

    You can see that lapply is evaluating the list in a different manner than you intend. The NAs don't appear to be the problem. When I run:

    by(CovDataM[ ,1:5], CovDataM$group, cov, use = "pairwise.complete.obs", method = "pearson")
    

    It seems to work fine. Hopefully that generalizes to your problem.