I am simulating the number of events from a Poisson distribution (with parameter 9). For each event, I am simulating the price of the event using a lognormal distribution with parameters 2 and 1.1
I am running 100 simulations (each simulation represents a year). The code (which I am happy with) is:
simul <- list()
for(i in 1:100) {simul[[i]] <- rlnorm (rpois(1, 9), meanlog = 2, sdlog = 1.1) }
My issue is that the output "simul" is a list of lists and I don't know how to apply basic operations to it.
I want to be able to: 1. cap each individual simulated value (due to budget constraints) 2. obtain the total of all the simulated values, separately for each year (with and without capping) 3. obtain the mean value for each individual year (with and without capping) 4. calculate the 95th percentile for each year (with and without capping) 5. output the results into a dataframe (so that one column represents the total, one the mean, one the percentile etc) and each row represents a year
Something that seems to work is me pulling out individual lists:
sim1 <- simul[1]
I can now use "unlist" to flatten the list and apply any operations I want.
sims1 <- data.frame(unlist(sim1), nrow=length(sim1), byrow=F)
sims1 <- subset(sims1, select = c(unlist.sim1.))
colnames(sims1) <- "sim1"
quantile <- data.frame(quantile(sims1$sim1, probs = c(0.95)))
But I dont want to write the above logic 100 times for each sublist in the list... is there a way around it?
Any help on this would be much appreciated.
You actually don't have a list of lists, you have a list of vectors. With lists, a single bracket subset will return a list, e.g. simul[1:3]
will return a list of the first 3 items of simul
, and for consistency simul[1]
returns a list of the first item of simul
.
To extract an element, rather than a length-1 list, use [[
. simul[[1]]
is a vector you don't need to run unlist()
on.
The nice thing about lists is you can work on them with for
loops or with lapply
/sapply
functions. For example
raw_means = sapply(simul, mean)
raw_sums = sapply(simul, sum)
raw_95 = sapply(simul, quantile, probs = 0.95)
result = data.frame(raw_means, raw_sums, raw_95)
Or with a loop,
raw_means = raw_sums = raw_95 = numeric(length(simul))
for (i in seq_along(simul)) {
raw_means[i] = mean(simul[[i]])
raw_sums[i] = sum(simul[[i]])
raw_95[i] = quantile(simul[[i]], probs = 0.95)
}
result = data.frame(raw_means, raw_sums, raw_95)
When you say "cap", I'm not sure if you mean subset or reduce values (e.g., with pmin
), so I'll leave that to you. But I'd recommend making a new list, like simul_cap = lapply(simul, pmin, 9)
(if that's the operation you want) and running the same code on your capped list. You could even make the summary statistics a function, so instead of copy-pasting a bunch you end up doing raw_result = foo(simul)
and cap_result = foo(simul_cap)
.