Search code examples
rsimulationcorrelationcategorical-datacontinuous

Simulation of correlated categorical and continuous data


I want to simulate correlated categorical and continuous data. How to achieve that in R?

#For example, how to simulate the data in a way that these two variable are correlated?
x <- sample( LETTERS[1:4], 1000, replace=TRUE, prob=c(0.1, 0.2, 0.65, 0.05) ) #Categorical variable
y <- runif(1000,1,5) #Continuous variable

Any ideas will be greatly appreciated!


Solution

  • Here's a method using copulas. Use larger values of alpha for higher correlation.

    library(copula)
    n <- 1000
    alpha <- 5
    u <- rCopula(n, claytonCopula(alpha))
    u1 <- u[,1]
    u2 <- u[,2]
    x <- ifelse(u1 < 0.1, "A", 
         ifelse(u1 < 0.3, "B", 
         ifelse(u1 < 0.95, "C", "D")))
    y <- qunif(u2, 1, 5)
    plot(factor(x), y)
    

    plot(factor(x))
    

    plot(density(y))
    

    Created on 2021-02-21 by the reprex package (v0.3.0)