I just want to say first that I'm pretty new to R coding. I wrote up some R code which will run over thousands of iterations. The code works and gets the results that I need, however it takes way too long to run. I'll first explain what the code is doing and then the code it self. How can I make this more efficient and make it run in a relatively short time over 200K+ iterations?
There is a while loop which runs until the total dollars reach the target dollars. First I generate a random number, which I look up on the Prob column in the first table below which returns the Dist column (this value is stored as a string). I parse the string and get a value based on the distribution and add it to a vector. Then I use this value to do a another look up on the second table below and get a factor and save these factors for each value in a second vector. I do this loop until I reach my target dollars. Then I multiple the two vectors to get my result vector. This while loop is then looped 200K+ times.
Prob Range Dist
.12 5000 rgamma(1, 3, , 900) + 1000
.70 100000 rgamma(1, 1, , 900) + 5000
.85 350000 rgamma(1,0.9, , 150000) + 200000
.95 1500000 rgamma(1,0.8, , 230000) + 200000
1.0 2500000 runif(1, 1500000, 2500000)
Range Factor
5000 rweibull(1, 20, 1.1)
100000 rweibull(1, 30, 1.2)
250000 rweibull(1, 25, 1.5)
2500000 rweibull(1, 25, 1.8)
Sample code is below. I've used dummy values in many places, there is other operations having a couple more similar operations as below. Ruing this 100 times takes about a minute. When I run it thousands of time, it will take too long. How can I make this code more efficient?
t <- proc.time()
#inputs
sims <- 100
totalD <- 0
totalRev <- c(150000000)
i <- 0
set.seed(1)
ProbRnge <- matrix(c(0.12, 0.70, 0.85, 0.95, 1,
5000, 100000, 350000, 1500000, 2500000,
1000, 5000, 100000, 350000, 1500000), ncol=3)
Dis1 <- c("rgamma(1, 3.0268, , 931.44) + 1000", "rgamma(1, 1.0664, , 931.44) + 5000",
"rgamma(1, 1.0664, , 931.44) + 5000", "rgamma(1, 1.0664, , 931.44) + 5000",
"runif(1, 1250000, 2000000)")
SizeRnge <- c(5000, 100000, 250000, 2500000)
Dis2 <- c("rweibull(1, 20, 1.1)", "rweibull(1, 30, 1.2)", "rweibull(1, 25, 1.5)",
"rweibull(1, 25, 1.8)")
#simulation loop
for (j in 1:sims) {
TotalDTemp <- NULL
FacTmp <- NULL
TotalDTemp <- vector()
FacTmp <- vector()
# loop while total simulated reached target total.
while(totalD < totalRev[1])
{
i = i + 1
#find where random number falls in range and look up distribution and calculate value and store in vector
row_i <- which.max(ProbRnge[,1] > runif(1))
tmpSize <- max(min(eval(parse(text=Dis1[row_i])), ProbRnge[row_i, 2]), ProbRnge[row_i, 3])
if (totalD + tmpSize > totalRev[1]) {
tmpSize = totalRev[1] - totalD
totalD = totalD + tmpSize
} else {
totalD = totalD + tmpSize }
TotalDTemp [i] <-tmpSize
# take value an lookup up factor to apply and store in vector
row_i <- which.max(SizeRnge > tmpSize)
tempRTR <- max(min(eval(parse(text=Dis2[row_i])), 2), 1)
FacTmp [i] <- tempRTR
}
DfacTotal <- TotalDTemp * FacTmp
totalD = 0
i = 0
}
proc.time() - t
If you profile your code, you see that what is taking the most of time is parsing the expressions. You could do that beforehand (before the loops) by computing
expr1 <- lapply(Dis1, function(text) parse(text = text))
expr2 <- lapply(Dis2, function(text) parse(text = text))
And then using eval(expr1[[row_i]])
instead of eval(parse(text=Dis1[row_i]))
.
For me, this reduces computation time from 45 sec to less than 2 sec.