Search code examples
rdataframemultivariate-testing

r covariance matrix and correlation matrix


Hello I am using the data dystrophy from package ipred. I've used a subset to separate from carriers and normal:

carrier = subset(dystrophy,dystrophy$Class == "carrier")
normal = subset(dystrophy,dystrophy$Class == "normal")

and I've reduce this data selecting only the patients with 1 visit at the hospital:

carrier = subset(carrier,carrier$OBS == "1")
normal = subset(normal,normal$OBS == "1")

So now I would like to practice calculating the means vector, covariance matrix and a correlation matrix of the proteins but by separated groups(Class factor).

I 've tried with cor and cov, but I think I am doing something wrong. Any help would be appreciated. thanks!!


Solution

  • This may get you started. Using your variables, you can get the means for each of the proteins using:

    sapply(carrier[,6:9], mean, na.rm=T)
    sapply(normal[,6:9], mean, na.rm=T)
    

    For the correlation and covariance you can use:

    cor(carrier[,6:9], use="pairwise.complete.obs")
    cor(normal[,6:9], use="pairwise.complete.obs")
    
    cov(carrier[,6:9], use="pairwise.complete.obs")
    cov(normal[,6:9], use="pairwise.complete.obs")
    

    The 6:9 part is there to restrict the computation to the proteins and not include other features like Age. The use="pairwise.complete.obs" part is there to handle the missing values.