Search code examples
rsimulation

Simulating variables based on other variables in R


I wish to simulate a set of categorical variables which correlates with a simulated numerical variable. More specifically, I have variable the age which is defined like: age <- rnorm(n=1000, mean=35, sd =9) and I wish to simulate another variables class in which higher age makes for higher class. Can anyone point me in the right direction? Thanks in advance!


Solution

  • What I understand is that if a correlates with b, it means a and b are linearly related. So, a can be represented by a linear function of b. To generate random variables, a random noise should be added.

    Here is one way of doing that:

    set.seed(1)
    age <- rnorm(n=10, mean=35, sd =9)
    beta <- runif(1, min = 1, max = 5) # or any other finite min and max values, can be positive or negative, but in your case should be positive.
    class <- beta*age + rnorm(length(age), mean = 0, sd = 2) # or any other mean and sd values
    
    # Check correlation between age and class
    cor(age, class)
    #[1] 0.9994416
    
    # Check if higher age makes for higher class
    data.frame(sort(age), sort(class))
    
       sort.age. sort.class.
    1   27.47934    129.6408
    2   27.61578    131.3707
    3   29.36192    137.5428
    4   32.25150    152.3856
    5   36.65279    171.3957
    6   37.96557    179.0890
    7   39.38686    184.8634
    8   40.18203    187.9404
    9   41.64492    198.2192
    10  49.35753    233.2981