I wrote a function in R to calculate cumulative totals by month number, but the execution time of my method grows exponentially as the dataset gets larger. I'm a novice R programmer, can you help me make this more efficient?
The function and the way I invoke the function:
accumulate <- function(recordnum,df){
sumthese <- (df$subject == df$subject[recordnum]) &
(df$month <= df$month[recordnum])
sum(df$measurement[sumthese])
}
set.seed(42)
datalength = 10
df <- data.frame(measurement = runif(1:datalength),
subject=rep(c("dog","cat"),each =datalength/2),
month=rep(seq(datalength/2,1,by=-1)))
system.time(df$cumulative <- sapply(1:datalength,accumulate,df))
The input dataframe:
> df
measurement subject month
1 0.4577418 dog 5
2 0.7191123 dog 4
3 0.9346722 dog 3
4 0.2554288 dog 2
5 0.4622928 dog 1
6 0.9400145 cat 5
7 0.9782264 cat 4
8 0.1174874 cat 3
9 0.4749971 cat 2
10 0.5603327 cat 1
The output dataframe:
> df
measurement subject month cumulative
1 0.9148060 dog 5 3.6102141
2 0.9370754 dog 4 2.6954081
3 0.2861395 dog 3 1.7583327
4 0.8304476 dog 2 1.4721931
5 0.6417455 dog 1 0.6417455
6 0.5190959 cat 5 2.7524079
7 0.7365883 cat 4 2.2333120
8 0.1346666 cat 3 1.4967237
9 0.6569923 cat 2 1.3620571
10 0.7050648 cat 1 0.7050648
Notice the cumulative column shows the accumulation of all measurements up to and including the current month. The function does not require the dataframe to be sorted. When the datalength equals 100, the elapsed time is 0.3. 1000 is 0.58. 10,000 = 27.72. I need this to run for 200K+ records.
Thanks!
This is non-destructive, i.e. the original df
is not modified. No packages are used. The original order of the rows of df
is preserved; however, if that is not important then [order(o), ]
on the last line can be omitted.
o <- order(df$subject, df$month)
transform(df[o, ], cumulative = ave(measurement, subject, FUN = cumsum))[order(o), ]
giving:
measurement subject month cumulative
1 0.37955924 dog 5 2.2580530
2 0.43577158 dog 4 1.8784938
3 0.03743103 dog 3 1.4427222
4 0.97353991 dog 2 1.4052912
5 0.43175125 dog 1 0.4317512
6 0.95757660 cat 5 4.0751151
7 0.88775491 cat 4 3.1175385
8 0.63997877 cat 3 2.2297836
9 0.97096661 cat 2 1.5898048
10 0.61883821 cat 1 0.6188382