Search code examples
rvectorgroupingcumulative-sumcumsum

Group numeric vector by predefined maximal group sum


I have a numeric vector like this x <- c(1, 23, 7, 10, 9, 2, 4) and I want to group the elements from left to right with the constrain that each group sum must not exceed 25. Thus, here the first group is c(1, 23), the second is c(7, 10) and the last c(9, 2, 4). the expected output is a dataframe with a second column containing the groups:

data.frame(x= c(1, 23,  7,  10,  9,  2,  4), group= c(1, 1, 2, 2, 3, 3, 3))

I have tried different things with cumsum but am not able to kind of dynamically restart cumsum for the new group once the limit sum of 25 for the last group is reached.


Solution

  • Here is a solution using base R and cumsum (and lapply for iteration):

    id <- c(seq(1, length(x),1)[!duplicated(cumsum(x) %/% 25)], length(x)+1)
    id2 <- 1:length(id)
    group <- unlist(lapply(1:(length(id)-1), function(x) rep(id2[x], diff(id)[x])))
    data.frame(x=x, group=group)
    
       x group
    1  1     1
    2 23     1
    3  7     2
    4 10     2
    5  9     3
    6  2     3
    7  4     3
    

    Edit: New Approach using recursive function

    Here is a new more efficient approach that should also cover the special case which @ЕгорШишунов considered and should work efficiently because it's written as a recursive function.

     recursiveFunction<- function(x, maxN=25, sumX=0, period=1, period2return=c()){
          sumX <- sumX + x[1]
          if (sumX >= maxN) { sumX=x[1]; period = period + 1}
          period2return <- c(period2return, period)
          if (length(x) == 1) { return(period2return)}
          return(recursiveFunction(x[-1], 25, sumX, period, period2return))
        }
        
        recursiveFunction(x, maxN=25)
    

    Note that you should not change the entries for the last three function parameters (sumX=0, period=1, period2return=c()) because they are only important during the recursive call of the function.