Search code examples
rcorrelationpearson

Constructing correlated variables


I have a variable with a given distribution (normale in my below example).

set.seed(32)    
var1 = rnorm(100,mean=0,sd=1)

I want to create a variable (var2) that is correlated to var1 with a linear correlation coefficient (roughly or exactly) equals to "Corr". The slope of regression between var1 and var2 should (rougly or exactly) equals 1.

Corr = 0.3

How can I achieve this?

I wanted to do something like this:

decorelation = rnorm(100,mean=0,sd=1-Corr)
var2 = var1 + decorelation

But of course when running:

cor(var1,var2)

The result is not close to Corr!


Solution

  • I did something similar a while ago. I am pasting some code that is for 3 correlated variables but it can be easily generalized to something more complex.

    Create an F matrix first:

    cor_Matrix <-  matrix(c (1.00, 0.90, 0.20 ,
                         0.90, 1.00, 0.40 ,
                         0.20, 0.40, 1.00), 
                      nrow=3,ncol=3,byrow=TRUE)
    

    This can be an arbitrary correlation matrix.

    library(psych) 
    
    fit<-principal(cor_Matrix, nfactors=3, rotate="none")
    
    fit$loadings
    
    loadings<-matrix(fit$loadings[1:3, 1:3],nrow=3,ncol=3,byrow=F)
    loadings
    
    #create three rannor variable
    
    cases <- t(replicate(3, rnorm(3000)) ) #edited, changed to 3000 cases from 150 cases
    
    multivar <- loadings %*% cases
    T_multivar <- t(multivar)
    
    var<-as.data.frame(T_multivar)
    
    cor(var)
    

    Again, this can be generalized. You approach listed above does not create a multivariate data set.