Search code examples
raggregate

Faster function than aggregate() in R


I have the following part of my code:

 result <- aggregate(cbind(x1,x2,x3,y1,y2,y3,z1,z2,z3,w)~date, rbind(result, datanew), sum, na.rm=F) 

Is there a faster way to obtain the same result? What I wouldlike to do is every time when I have a new data, the new data to be rbinded with the old one and in same time to do the sum per column and row.

For instance:

old.data=data.frame(x=c(1:3),y=c(4:6),z=c(7:9),id=c("A","B","B"))
new.data=data.frame(x=c(2:4),y=c(5:7),z=c(8:10),id=c("B","A","A"))
result <- aggregate(cbind(x,y,z)~id, rbind(old.data, new.data), sum, na.rm=F)

I am searching for a better solution because this is repeated 100000 times.

Thanks


Solution

  • Im sure the real data is much larger but your solution seems on-point. as some alternatives I benchmarked other approaches:

    Tidyverse

    tidy_fn <- function(){
        rbind(old.data, new.data) %>% group_by(id) %>% dplyr::summarise_all(
            function(x)sum(x)
        )
    }
    

    Plyr and base functions (I know..bad-form)

    plyr_base_fn <- function(){
    
      plyr::ldply(Map(function(x){
        sapply(x[1:3],sum)
        }, rbind(old.data,new.data) %>% split(., .$id)
        ))
    
    }
    

    Your aggregation approach:

    agg_fn <- function(){
        aggregate(cbind(x,y,z)~id, rbind(old.data, new.data), sum, na.rm=F)
    }
    

    Results from two tests:

    1000 reps
    > microbenchmark(tidy_fn(),agg_fn(),plyr_base_fn(),times = 1000L)
    Unit: milliseconds
               expr      min       lq     mean   median       uq       max neval
          tidy_fn() 2.220585 2.386112 2.823122 2.529649 2.775300 13.425573  1000
           agg_fn() 1.668601 1.795527 2.149068 1.895666 2.062904 16.117802  1000
     plyr_base_fn() 1.253772 1.331501 1.567777 1.402464 1.526089  8.396307  1000
    
    5000 reps
    microbenchmark(tidy_fn(),agg_fn(),plyr_base_fn(),times = 5000L)
        Unit: milliseconds
                   expr      min       lq     mean   median       uq       max neval
              tidy_fn() 2.227752 2.400265 2.696034 2.542617 2.722082  12.46249  5000
               agg_fn() 1.673647 1.792085 2.067232 1.897011 2.019915 301.84694  5000
         plyr_base_fn() 1.247306 1.336010 1.503682 1.411608 1.503290  14.24656  5000