Search code examples
rbin-packing

Creating groups of equal sum in R


I am trying to group a column of my data.frame/data.table into three groups, all with equal sums.

The data is first ordered from smallest to largest, such that group one would be made up of a large number of rows with small values, and group three would have a small number of rows with large values. This is accomplished in spirit with:

test <- data.frame(x = as.numeric(1:100000))
store <- 0
total <- sum(test$x)

for(i in 1:100000){

  store <- store + test$x[i]

  if(store < total/3){

    test$y[i] <- 1

  } else {

      if(store < 2*total/3){

        test$y[i] <- 2

      } else { 

        test$y[i] <- 3

      }     
  }    
}

While successful, I feel like there must be a better way (and maybe a very obvious solution that I am missing).

  • I never like resorting to loops, especially with nested ifs, when a vectorized approach is available - with even 100,000+ records this code becomes quite slow
  • This method would become impossibly complex to code to a larger number of groups (not necessarily the looping, but the ifs)
  • Requires pre-ordering of the column. Might not be able to get around this one.

As a nuance (not that it makes a difference) but the data to be summed would not always (or ever) be consecutive integers.


Solution

  • Maybe with cumsum:

    test$z <- cumsum(test$x) %/% (ceiling(sum(test$x) / 3)) + 1