Search code examples
rdplyrgroupingcumsum

Group according to cumulative sums


I want to make groups according to the accumulated sum. in my tiny example, the two first numbers in the column value sums to exactly 100 and is group A. The three next numbers sum to less than 100 and go to group B, while the last number exceeds 100 and gets its own group C - and so forth.

input <- data.frame(id=c(1:6),
                   value =c(99, 1,  33, 33, 33, 150))
input

desired_output <- data.frame(id=c(1:6),
              value =c(99, 1,  33, 33, 33, 150),
              group= c("A", "A", "B", "B", "B", "C"))

desired_output

Thank you


Solution

  • Two possible one-liners, with purrr::accumulate and with MESS::cumsumbinning:

    purrr::accumulate

    library(tidyverse)
    group_by(input, grp = LETTERS[cumsum(value == accumulate(value, ~ ifelse(.x + .y <= 100, .x + .y, .y)))])
    

    MESS::cumsumbinning

    library(dplyr)
    group_by(input, grp = LETTERS[MESS::cumsumbinning(value, 100)])
    

    output

    # A tibble: 6 x 3
    # Groups:   grp [3]
         id value grp  
      <int> <dbl> <chr>
    1     1    99 A    
    2     2     1 A    
    3     3    33 B    
    4     4    33 B    
    5     5    33 B    
    6     6   150 C