Search code examples
rdplyrcumsum

Use of cumsum() iterativley in one column


Is it possible to use cumsum() iteratively in one column with start - stop conditional on other column:

  1. given the dataframe df with one column X where values are ascending.
  2. cumsum() should stop when reaching 10 or a multiple of ten (e.g.20,30,40,...).
  3. then after reaching this point (10,20,30,40,..) cumsum() should start new... and so on.
  4. In case of ties (20,20 or 30,30) cumsum should stop in the last occurrence of 10, 20, 30, 40,... this is the dataframe:
df <- structure(list(X = c(55L, 95L, 39L, 52L, 22L, 93L, 76L, 82L, 
77L, 58L, 60L, 19L, 31L, 43L, 65L, 56L, 18L, 66L, 21L, 49L, 13L, 
37L, 36L, 51L, 41L, 7L, 91L, 3L, 11L, 65L, 51L, 32L, 25L, 10L, 
5L, 7L, 8L, 3L, 72L, 66L, 93L, 24L, 48L, 44L, 91L, 60L, 62L, 
89L, 100L, 69L, 5L, 89L, 54L, 19L, 39L, 93L, 60L, 64L, 89L, 81L, 
24L, 9L, 51L, 9L, 7L, 69L, 19L, 51L, 39L, 100L, 83L, 67L, 33L, 
84L, 66L, 25L, 20L, 87L, 55L, 56L, 83L, 52L, 96L, 42L, 16L, 64L, 
45L, 30L, 55L, 29L, 16L, 73L, 40L, 29L, 92L, 6L, 38L, 12L, 38L, 
89L)), class = "data.frame", row.names = c(NA, -100L))

Many thanks!!!

desired output as an example up to 30.

enter image description here


Solution

  • UPDATE

    Case-1: To handle the ties efficiently (OP's condition-4). Let's take a different example where there are ties as well as two consecutive values divisible by 10. (I think other strategies may fail there)
    df1 <- data.frame(X = c(3, 4, 10, 10, 10, 13, 20, 30, 31, 40, 45))
    
    df1 %>% arrange(X) %>% group_by(X) %>%
      mutate(d = n(),
             d2 = row_number(),
             d2 = d2 == max(d2)) %>% ungroup() %>% 
      group_by(Y = cumsum( X %% 10 == 0 & d2)) %>%
      mutate(Y = cumsum(X)) %>% ungroup() %>%
      select(-d, -d2)
    
    # A tibble: 11 x 2
           X     Y
       <dbl> <dbl>
     1     3     3
     2     4     7
     3    10    17
     4    10    27
     5    10    10
     6    13    23
     7    20    20
     8    30    30
     9    31    61
    10    40    40
    11    45    85
    
    This can be done through accumulate also.

    Case-2: when group starts next value and ties are also handled properly

    df1 %>% arrange(X) %>% group_by(X) %>%
      mutate(d = n(),
             d2 = row_number(),
             d2 = d2 == max(d2)) %>% ungroup() %>% 
      group_by(Y = lag(cumsum( X %% 10 == 0 & d2), default = 0)) %>%
      mutate(Y = cumsum(X)) %>% ungroup() %>%
      select(-d, -d2)
    # A tibble: 11 x 2
           X     Y
       <dbl> <dbl>
     1     3     3
     2     4     7
     3    10    17
     4    10    27
     5    10    37
     6    13    13
     7    20    33
     8    30    30
     9    31    31
    10    40    71
    11    45    45
    

    Earlier Answer

    Case-3: When next cumsum starts from next value.
    df %>% arrange(X) %>%
      mutate(y = accumulate(X, .init = 0, ~ifelse(.y %% 10 == 0, 1, 0))[-nrow(df)],
             y = accumulate2(X, y, .init = 0, ~ifelse(..3 == 1, ..2, ..1 + ..2))[-1])
    
         X    y
    1     3    3
    2     3    6
    3     5   11
    4     5   16
    5     6   22
    6     7   29
    7     7   36
    8     7   43
    9     8   51
    10    9   60
    11    9   69
    12   10   79
    13   11   11
    14   12   23
    15   13   36
    16   16   52
    17   16   68
    18   18   86
    19   19  105
    20   19  124
    21   19  143
    22   20  163
    23   21   21
    24   22   43
    25   24   67
    26   24   91
    27   25  116
    28   25  141
    29   29  170
    30   29  199
    31   30  229
    32   31   31
    33   32   63
    34   33   96
    35   36  132
    36   37  169
    37   38  207
    38   38  245
    39   39  284
    40   39  323
    41   39  362
    42   40  402
    43   41   41
    44   42   83
    45   43  126
    46   44  170
    47   45  215
    48   48  263
    49   49  312
    50   51  363
    51   51  414
    52   51  465
    53   51  516
    54   52  568
    55   52  620
    56   54  674
    57   55  729
    58   55  784
    59   55  839
    60   56  895
    61   56  951
    62   58 1009
    63   60 1069
    64   60   60
    65   60   60
    66   62   62
    67   64  126
    68   64  190
    69   65  255
    70   65  320
    71   66  386
    72   66  452
    73   66  518
    74   67  585
    75   69  654
    76   69  723
    77   72  795
    78   73  868
    79   76  944
    80   77 1021
    81   81 1102
    82   82 1184
    83   83 1267
    84   83 1350
    85   84 1434
    86   87 1521
    87   89 1610
    88   89 1699
    89   89 1788
    90   89 1877
    91   91 1968
    92   91 2059
    93   92 2151
    94   93 2244
    95   93 2337
    96   93 2430
    97   95 2525
    98   96 2621
    99  100 2721
    100 100  100