Sampling from a multivariate distribution including gender in R

I'm trying to simulate a wider population from a small one in R as follows:

idata <- subset(data, select=c(WT, AGE, HT, BFP, SEX) )
M= cor(idata)
mu <- sapply(idata, mean)
sd <- sapply(idata, stdev)
sigma=cor2cov(M, sd)
simulation <- as.data.frame(mvrnorm(1000, mu, sigma))

But the problems is, for SEX, the code will consider a continuous distribution, while it has to be binary, and effects of sex has to be either fully considered (SEX==1), or not at all (SEX==0). I'd appreciate any help with this regard. Thanks

Solution

What you should do is consider that your data consists of two sub-populations, and then draw data from them, based on their proportions.

So, first estimate the proportions, pi_m and pi_f (= 1 - pi_m), which are the proportion of SEX == 0 and SEX == 1. This should be something like pi_m = sum(idata$SEX == 1)/ nrow(idata)

Then estimate parameters for the two populations, mu_f, mu_m, sigma_f and sigma_m, which are mean and covariance parameters for the two SEX populations (now without the SEX variable).

The first draw a random number r <- runif(1), if this is less than equal to pi_m then generate a sample from N(mu_m, sigma_s) else from N(mu_f, sigma_f).

You can do this step 1000 times to get 1000 samples from your distribution.

Of course, you can vector this, by first generating 1000 samples from runif. For example

n_m <- sum(runif(1000) <= pi_m)
n_f <- 1000 - n_m

X_m <- rmvnorm(n_m, mu_m, sigma_m)
X_f <- rmvnorm(n_f, mu_f, sigma_f)

X <- rbind(X_m, X_f)