Hello I am using the data dystrophy from package ipred. I've used a subset to separate from carriers and normal:
carrier = subset(dystrophy,dystrophy$Class == "carrier")
normal = subset(dystrophy,dystrophy$Class == "normal")
and I've reduce this data selecting only the patients with 1 visit at the hospital:
carrier = subset(carrier,carrier$OBS == "1")
normal = subset(normal,normal$OBS == "1")
So now I would like to practice calculating the means vector, covariance matrix and a correlation matrix of the proteins but by separated groups(Class factor).
I 've tried with cor and cov, but I think I am doing something wrong. Any help would be appreciated. thanks!!
This may get you started. Using your variables, you can get the means for each of the proteins using:
sapply(carrier[,6:9], mean, na.rm=T)
sapply(normal[,6:9], mean, na.rm=T)
For the correlation and covariance you can use:
cor(carrier[,6:9], use="pairwise.complete.obs")
cor(normal[,6:9], use="pairwise.complete.obs")
cov(carrier[,6:9], use="pairwise.complete.obs")
cov(normal[,6:9], use="pairwise.complete.obs")
The 6:9
part is there to restrict the computation to the proteins and not include other features like Age. The use="pairwise.complete.obs"
part is there to handle the missing values.