I want to create a function which helps characterise the results to some simulations. For the purposes of this post let the simulation function be:
example_sim <- function(time=100, npops=5){
result <- data.frame(matrix(NA, nrow = time, ncol = npops))
colnames(result) <- LETTERS[1:npops]
for(i in 1:npops){
sim <- sample.int(time, time)
result[,i] <- sim
result[,i] <- result[,i]*i
}
return(result)
}
This creates a data frame with varying length and width based on the number of populations (npops) and the time simulated.
I want to create a function which uses the output of such simulations and characterises the mean, variance for each population over an n amount of simulations (nsims).
So far I have managed to get it working for two populations with the following code:
library("matrixStats")
library("reshape2")
ensembles <- function(nsims=10, time = 100, npops = 2){
result_N.A <- data.frame(matrix(NA, nrow = time, ncol = nsims))
result_N.B <- data.frame(matrix(NA, nrow = time, ncol = nsims))
for( i in 1:(nsims)){
simulation_with_2pops <- example_sim(time=100,npops=2)
result_N.A[,i] <- simulation_with_2pops[,1]
result_N.B[,i] <- simulation_with_2pops[,2]
}
output <- simulation_with_2pops
for( j in 1:params$ntime){
output$meanA[j] <- rowMeans(result_N.A[j,])
}
for( j in 1:params$ntime){
output$meanB[j] <- rowMeans(result_N.B[j,])
}
for( j in 1:params$ntime){
output$varA[j] <- rowVars(as.matrix(result_N.A[j,]))
}
for( j in 1:params$ntime){
output$varB[j] <- rowVars(as.matrix(result_N.B[j,]))
}
return(output)
}
ensembles_output<- ensembles(nsims = 10)
ensembles_output
To fully implement the function for any number of populations I would need to create another for loop where I create and update the result_N.A
object. (Presumably called something like result[i]
.)
I have also thought about creating a 3 dimensional object (time, npops, nsims) and taking a slice of it to calculate the mean and variance but i havent had much success yet.
I am not married for this route and am very open to other recommendations.
Eventually I would like to create a code where the covariance and correlation are also calculated by giving highlighting two populations in the parameters. (for instance population A and population E). If you have any ideas on the implementation i would be very grateful to hear them.
Thank you for considering this problem.
I think using a multidimensional array is a very good idea in this case.
First, you can get the simulations of example_sim()
much cheaper using mapply()
. Here an example with time=10
and npops=3
. Use the same set.seed(42)
and parameters and check for yourself.
I use much smaller parameters here so that you can easily check the result in your head.
set.seed(42)
sim <- replicate(nsims, mapply(\(time, i) sample.int(time, time)*i, 10, 1:3))
sim
# , , 1
#
# [,1] [,2] [,3]
# [1,] 1 16 27
# [2,] 5 14 30
# [3,] 10 8 9
# [4,] 8 2 12
# [5,] 2 10 15
# [6,] 4 20 18
# [7,] 6 4 3
# [8,] 9 12 6
# [9,] 7 18 24
# [10,] 3 6 21
#
# , , 2
#
# [,1] [,2] [,3]
# [1,] 3 10 18
# [2,] 1 8 6
# [3,] 2 4 12
# [4,] 6 16 9
# [5,] 10 6 30
# [6,] 8 2 15
# [7,] 4 20 27
# [8,] 5 14 21
# [9,] 7 12 24
# [10,] 9 18 3
#
# , , 3
#
# [,1] [,2] [,3]
# [1,] 10 8 18
# [2,] 8 18 6
# [3,] 5 6 27
# [4,] 1 16 3
# [5,] 7 10 24
# [6,] 4 12 15
# [7,] 6 20 30
# [8,] 2 4 9
# [9,] 9 2 12
# [10,] 3 14 21
Next, I believe you want to gather row-wise statistics across each population column A, B, C, ... . Here you basically want apply(., MARGINS=1:2, FUN)
. Just for the mean there exists rowMeans(., dims=2L)
, which is faster.
rowMeans(sim, dims=2L)
# [,1] [,2] [,3]
# [1,] 4.666667 11.333333 21
# [2,] 4.666667 13.333333 14
# [3,] 5.666667 6.000000 16
# [4,] 5.000000 11.333333 8
# [5,] 6.333333 8.666667 23
# [6,] 5.333333 11.333333 16
# [7,] 5.333333 14.666667 20
# [8,] 5.333333 10.000000 12
# [9,] 7.666667 10.666667 20
# [10,] 5.000000 12.666667 15
apply(sim, 1:2, var)
# [,1] [,2] [,3]
# [1,] 22.333333 17.333333 27
# [2,] 12.333333 25.333333 192
# [3,] 16.333333 4.000000 93
# [4,] 13.000000 65.333333 21
# [5,] 16.333333 5.333333 57
# [6,] 5.333333 81.333333 3
# [7,] 1.333333 85.333333 219
# [8,] 12.333333 28.000000 63
# [9,] 1.333333 65.333333 48
# [10,] 12.000000 37.333333 108
I'm not sure however, why you use simulation_with_2pops
for your final output, since it's the result of last iteration of for (i in 1:nsims)
loop. Anyway, hope this helps you further.
Note: R >= 4.1 used.