I want to perform partial correlation analysis among multiple columns controlling by multiple covariates, and then extract r and p-value. My real data have some missing values.
I found that this answer (Pairwise partial correlation of a matrix, controlling by one variable) might be useful, so I adjusted this method into my code. Because I have missing values, so I cannot use ppcor::pcor.test()
, which is described as 'Missing values are not allowed', to achieve my goal.
Here I use the built-in dataset mtcars
to display the problem I met.
# load "ggm" packages to perform partial correlation analysis
library(ggm)
# subset mtcars dataset and make some datapoints as missing values
mydata <- cbind(mtcars[1:8])
mydata[4:10,3] <- rep(NA,7)
mydata[1:5,4] <- NA
# perform partial correlation analysis among the first 6 columns with the last two columns as covariates
sapply(1:(ncol(mydata)-2), function(x) sapply(1:(ncol(mydata)-2), function(y) {
if (x == y) 1
else ggm::pcor(c(mydata[,x], mydata[,y], mydata[,7], mydata[,8]),var(mydata))
}))
# error:
Error in S[u, u] : subscript out of bounds
I got error at this step and could neither perform partial correlation nor extract r and p-values.
Many thanks to you to help me!
Ella
You don't need to pass the column values in pcor
function. You can pass column number or column names. Try :
sapply(1:(ncol(mydata)-2), function(x) sapply(1:(ncol(mydata)-2), function(y) {
if (x == y) 1
else ggm::pcor(c(x, y,7,8),var(mydata))
}))
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 1.0000000 -0.7208025 NA NA 0.5717984 -0.8260219
#[2,] -0.7208025 1.0000000 NA NA -0.6969510 0.7414846
#[3,] NA NA 1 NA NA NA
#[4,] NA NA NA 1 NA NA
#[5,] 0.5717984 -0.6969510 NA NA 1.0000000 -0.5510354
#[6,] -0.8260219 0.7414846 NA NA -0.5510354 1.0000000