rmatrixregressioncorrelation

# R - How do I run a regression based on a correlation matrix rather than raw data?

I would like to run a regression based on a correlation matrix rather than raw data. I have looked at this post, but can't make sense of it. How do I do this in R?

Here is some code:

``````#Correlation matrix.
MyMatrix <- matrix(
c(1.0, 0.1, 0.5, 0.4,
0.1, 1.0, 0.9, 0.3,
0.5, 0.9, 1.0, 0.3,
0.4, 0.3, 0.3, 1.0),
nrow=4,
ncol=4)

df <- as.data.frame(MyMatrix)

colnames(df)[colnames(df)=="V1"] <- "a"
colnames(df)[colnames(df)=="V2"] <- "b"
colnames(df)[colnames(df)=="V3"] <- "c"
colnames(df)[colnames(df)=="V4"] <- "d"

#Assume means and standard deviations as follows:
MEAN.a <- 4.00
MEAN.b <- 3.90
MEAN.c <- 4.10
MEAN.d <- 5.00
SD.a <- 1.01
SD.b <- 0.95
SD.c <- 0.99
SD.d <- 2.20

library(lavaan)
m1 <- 'd ~ a + b + c'
fit <- sem(m1, ????)
summary(fit, standardize=TRUE)
``````

Solution

• This should do it. First you can convert your correlation matrix to a covariance matrix:

``````MyMatrix <- matrix(
c(1.0, 0.1, 0.5, 0.4,
0.1, 1.0, 0.9, 0.3,
0.5, 0.9, 1.0, 0.3,
0.4, 0.3, 0.3, 1.0),
nrow=4,
ncol=4)
rownames(MyMatrix) <- colnames(MyMatrix) <- c("a", "b","c","d")

#Assume the following means and standard deviations:
MEAN.a <- 4.00
MEAN.b <- 3.90
MEAN.c <- 4.10
MEAN.d <- 5.00
SD.a <- 1.01
SD.b <- 0.95
SD.c <- 0.99
SD.d <- 2.20
s <- c(SD.a, SD.b, SD.c, SD.d)
m <- c(MEAN.a, MEAN.b, MEAN.c, MEAN.d)
cov.mat <- diag(s) %*% MyMatrix %*% diag(s)
rownames(cov.mat) <- colnames(cov.mat) <- rownames(MyMatrix)
names(m) <- rownames(MyMatrix)
``````

Then, you can use `lavaan` to estimate the model along the lines of the post you mentioned in your question. Note, you need to supply a number of observations to get the sample estimate. I used 100 for the example, but you may want to change it if that doesn't make sense.

``````library(lavaan)
m1 <- 'd ~ a + b + c'
fit <- sem(m1,
sample.cov = cov.mat,
sample.nobs=100,
sample.mean=m,
meanstructure=TRUE)
summary(fit, standardize=TRUE)
# lavaan 0.6-6 ended normally after 44 iterations
#
# Estimator                                         ML
# Optimization method                           NLMINB
# Number of free parameters                          5
#
# Number of observations                           100
#
# Model Test User Model:
#
# Test statistic                                 0.000
# Degrees of freedom                                 0
#
# Parameter Estimates:
#
# Standard errors                             Standard
# Information                                 Expected
# Information saturated (h1) model          Structured
#
# Regressions:
#                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
# d ~
#   a                 6.317    0.095   66.531    0.000    6.317    2.900
#   b                12.737    0.201   63.509    0.000   12.737    5.500
#   c               -13.556    0.221  -61.307    0.000  -13.556   -6.100
#
# Intercepts:
#                 Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
# .d               -14.363    0.282  -50.850    0.000  -14.363   -6.562
#
# Variances:
#                 Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
# .d                 0.096    0.014    7.071    0.000    0.096    0.020
#
#

``````