Search code examples
rfor-looprandommultivariate-testingdata-generation

Generate multivariate normal data with unequal sample sizes


I would like to generate multivariate random data manipulating the sample size and variance using MASS::mvrnorm (or, as the case may prove to be, rnorm). This is fairly straightforward, however, the trick is that I intend to simulate a statistic with this generated data that compares two different sample sizes (of different lengths). Essentially, this creates a 3X3 design where I have three levels of 2 different sample sizes (e.g. [450,150], [300,300], [150,450]) crossed with three levels of variance (e.g. [1,1], [1,3], [1,10]).

    n <- c(450,150,300,300,150,450) # sample sizes

    sig <- matrix(c(1,1,1,3,1,10), nrow=2, byrow=F) # variance

    mu <- (5, 5, 5) # mean is constant across all conditions 

    mvrnorm(n, mu, sig) 

I'm sure that I have to iterate through my vector of sample sizes, just as I would if I were only generating one sample size per condition. But since I'm generating two sample sizes for each condition, I am unsure how to do this.


Solution

  • If I understand you correctly, you want six samples: 450 draws from N(5, 1), 150 draws from N(5, 1), 300 draws from N(5, 1), 300 draws from N(5, 3), 150 draws from N(5, 1), and 450 draws from N(5, 10). You can get that via

    samples <- mapply(rnorm, n = c(450,150,300,300,150,450), sd = c(1,1,1,3,1,10))
    

    Obviously I'll omit the output due to size, but you can see what I mean:

    str(samples)
    List of 6
     $ : num [1:450] 0.785 -0.21 0.192 -0.265 -0.501 ...
     $ : num [1:150] 1.224 -0.315 -0.131 -0.923 0.407 ...
     $ : num [1:300] -0.413 -1.081 0.469 1.332 0.244 ...
     $ : num [1:300] -0.748 -0.628 0.753 1.4 3.883 ...
     $ : num [1:150] 0.376 -1.193 1.133 1.839 1.528 ...
     $ : num [1:450] 2.19 -3.17 2.45 0.75 -8.4 ...
    

    Then you want to run some sort of test between samples[[1]] and samples[[2]], then between samples[[3]] and samples[[4]], and finally between samples[[5]] and samples[[6]]. I don't know what test you intend to run, but that should be straightforward if you have a function for the test: Just feed in the proper list elements.

    Update

    Based on the comment, what you need to get all the sample combinations you want is

    f <- function(sample_size_pairs, sd_pairs) {
        return(sapply(1:nrow(sample_size_pairs), function(i) {
            mapply(rnorm, n = sample_size_pairs[i, ], sd = sd_pairs[i])
        }))
    }
    
    sample_sizes <- matrix(c(rep(c(450, 150), 3), rep(c(150, 450), 3),
                             rep(c(300, 300), 3)), ncol = 2, byrow = TRUE)
    sds <- matrix(rep(c(1, 1, 1, 3, 1, 10), 3), ncol = 2, byrow = TRUE)
    
    g <- f(sample_sizes, sds)
    
    str(g)
    
    List of 9
     $ :List of 2
      ..$ : num [1:450] 1.4243 1.733 0.5004 -0.8036 -0.0101 ...
      ..$ : num [1:150] -0.0607 0.1797 0.3787 -0.6676 -1.4352 ...
     $ :List of 2
      ..$ : num [1:450] -0.0766 -0.1407 -0.4893 0.2251 1.0174 ...
      ..$ : num [1:150] -1.8814 -1.3532 -1.2888 -0.0542 0.2637 ...
     $ :List of 2
      ..$ : num [1:450] 1.945 -1.375 -1.258 0.292 -0.208 ...
      ..$ : num [1:150] -1.291 -0.557 -1.199 1.385 -2.062 ...
     $ :List of 2
      ..$ : num [1:150] -2.461 -0.345 -1.454 -0.286 0.942 ...
      ..$ : num [1:450] -0.75 -0.636 -0.488 1.818 -0.585 ...
     $ :List of 2
      ..$ : num [1:150] -1.238 -0.765 -1.447 -1.153 -1.466 ...
      ..$ : num [1:450] 2.5461 0.9368 -0.0503 -0.9727 -1.4101 ...
     $ :List of 2
      ..$ : num [1:150] 0.7209 2.4342 -0.7617 0.0285 -1.3297 ...
      ..$ : num [1:450] -0.6882 0.0927 -0.8981 -0.4088 1.3421 ...
     $ : num [1:300, 1:2] 2.217 -0.161 -0.976 0.26 -0.362 ...
     $ : num [1:300, 1:2] 0.456 -0.112 -0.541 3.759 0.32 ...
     $ : num [1:300, 1:2] 0.165 0.247 -0.187 -0.624 -1.335 ...