I wish to simulate a set of categorical variables which correlates with a simulated numerical variable. More specifically, I have variable the age
which is defined like: age <- rnorm(n=1000, mean=35, sd =9)
and I wish to simulate another variables class
in which higher age makes for higher class. Can anyone point me in the right direction? Thanks in advance!
What I understand is that if a
correlates with b
, it means a
and b
are linearly related. So, a
can be represented by a linear function of b
. To generate random variables, a random noise should be added.
Here is one way of doing that:
set.seed(1)
age <- rnorm(n=10, mean=35, sd =9)
beta <- runif(1, min = 1, max = 5) # or any other finite min and max values, can be positive or negative, but in your case should be positive.
class <- beta*age + rnorm(length(age), mean = 0, sd = 2) # or any other mean and sd values
# Check correlation between age and class
cor(age, class)
#[1] 0.9994416
# Check if higher age makes for higher class
data.frame(sort(age), sort(class))
sort.age. sort.class.
1 27.47934 129.6408
2 27.61578 131.3707
3 29.36192 137.5428
4 32.25150 152.3856
5 36.65279 171.3957
6 37.96557 179.0890
7 39.38686 184.8634
8 40.18203 187.9404
9 41.64492 198.2192
10 49.35753 233.2981