I want to simulate correlated categorical and continuous data. How to achieve that in R?
#For example, how to simulate the data in a way that these two variable are correlated?
x <- sample( LETTERS[1:4], 1000, replace=TRUE, prob=c(0.1, 0.2, 0.65, 0.05) ) #Categorical variable
y <- runif(1000,1,5) #Continuous variable
Any ideas will be greatly appreciated!
Here's a method using copulas. Use larger values of alpha
for higher correlation.
library(copula)
n <- 1000
alpha <- 5
u <- rCopula(n, claytonCopula(alpha))
u1 <- u[,1]
u2 <- u[,2]
x <- ifelse(u1 < 0.1, "A",
ifelse(u1 < 0.3, "B",
ifelse(u1 < 0.95, "C", "D")))
y <- qunif(u2, 1, 5)
plot(factor(x), y)
plot(factor(x))
plot(density(y))
Created on 2021-02-21 by the reprex package (v0.3.0)