Search code examples
rhistogrambinning

R - faster alternative to hist(XX, plot=FALSE)$count


I am on the lookout for a faster alternative to R's hist(x, breaks=XXX, plot=FALSE)$count function as I don't need any of the other output that is produced (as I want to use it in an sapply call, requiring 1 million iterations in which this function would be called), e.g.

x = runif(100000000, 2.5, 2.6)
bincounts = hist(x, breaks=seq(0,3,length.out=100), plot=FALSE)$count

Any thoughts?


Solution

  • A first attempt using table and cut:

    table(cut(x, breaks=seq(0,3,length.out=100)))
    

    It avoids the extra output, but takes about 34 seconds on my computer:

    system.time(table(cut(x, breaks=seq(0,3,length.out=100))))
       user  system elapsed 
     34.148   0.532  34.696 
    

    compared to 3.5 seconds for hist:

    system.time(hist(x, breaks=seq(0,3,length.out=100), plot=FALSE)$count)
       user  system elapsed 
      3.448   0.156   3.605
    

    Using tabulate and .bincode runs a little bit faster than hist:

    tabulate(.bincode(x, breaks=seq(0,3,length.out=100)), nbins=100)
    
    system.time(tabulate(.bincode(x, breaks=seq(0,3,length.out=100))), nbins=100)
       user  system elapsed 
      3.084   0.024   3.107
    

    Using tablulate and findInterval provides a significant performance boost relative to table and cut and has an OK improvement relative to hist:

    tabulate(findInterval(x, vec=seq(0,3,length.out=100)), nbins=100)
    
    system.time(tabulate(findInterval(x, vec=seq(0,3,length.out=100))), nbins=100)
       user  system elapsed 
      2.044   0.012   2.055