Search code examples
rpurrrrolling-computationcumsumaccumulate

Keeping the max within a group constant within a group using base::cumsum


Use the data below to make the cumsum_a column look like the should column.

Data to start with:

> demo
    th seq group
1 20.1   1    10
2 24.1   2    10
3 26.1   3    10
4  1.1   1    20
5  2.1   2    20
6  4.1   3    20

The "should" column below is the goal.

demo<-data.frame(th=c(c(20.1,24.1,26.1),(c(1.1,2.1,4.1))),
    seq=(c(1:3,1:3)),group=c(rep(10,3),rep(20,3)))

library(magrittr)
library(dplyr)

demo %>% 
    group_by(group) %>%
        mutate(
    cumsum_a= cumsum((group)^seq*
            (((th)/cummax(th)))))%>%  
        ungroup()%>%
        mutate(.,
  cumsum_m=c( #As an example only, this manually does exactly what cumsum_a is doing (which is wrong)
        10^1*20.1/20.1,   #good
        10^1*20.1/20.1 + 10^2*24.1/24.1,            #different denominators, bad       
        10^1*20.1/20.1 + 10^2*24.1/24.1 + 10^3*26.1/26.1, #different denominators, bad
        20^1*1.1/1.1, #good
        20^1*1.1/1.1 + 20^2*2.1/2.1, #different denominators, bad
        20^1*1.1/1.1 + 20^2*2.1/2.1 + 20^3*4.1/4.1 #different denominators, bad
    ),
  should=c( #this is exactly the kind of calculation I want
        10^1*20.1/20.1,  #good
        10^1*20.1/24.1 + 10^2*24.1/24.1,        #good
        10^1*20.1/26.1 + 10^2*24.1/26.1 + 10^3*26.1/26.1, #good
        20^1*1.1/1.1, #good
        20^1*1.1/2.1 + 20^2*2.1/2.1, #good
        20^1*1.1/4.1 + 20^2*2.1/4.1 + 20^3*4.1/4.1 #good
    )

)

Most simply put, denominators need to be the same for each row so 24.1 and 24.1 instead of 20.1 and 24.1 on the second row of cumsum_m or the underlying calculations for cumsum_a.

Here are the new columns, where should is what cumsum_a or cumsum_m should be.

     th   seq group cumsum_a cumsum_m should
  <dbl> <int> <dbl>    <dbl>    <dbl>  <dbl>
1  20.1     1    10       10       10    10 
2  24.1     2    10      110      110   108.
3  26.1     3    10     1110     1110  1100.
4   1.1     1    20       20       20    20 
5   2.1     2    20      420      420   410.
6   4.1     3    20     8420     8420  8210.

Solution

  • You can use the following solution:

    • purrr::accumulate takes a two argument function, the first one which is represented by .x or ..1 is the accumulated value of the previous iterations and .y represents the current value of our vector (2:n()). So our first accumulated value will be first element of group value as I supplied it as .init argument
    • Since you would like to change the denominator of the previous iterations/ calculations, I multiplied the result .x by the ratio of the previous value of cmax to the current value of cmax

    I think the rest is pretty clear but if you have any more question about it just let me know.

    library(dplyr)
    library(purrr)
    
    demo %>%
      group_by(group) %>%
      mutate(cmax = cummax(th), 
             should = accumulate(2:n(), .init = group[1], 
                                 ~ (.x * cmax[.y - 1] / cmax[.y]) + (group[.y] ^ seq[.y]) * (th[.y] / cmax[.y])))
    
    # A tibble: 6 x 5
    # Groups:   group [2]
         th   seq group  cmax should
      <dbl> <int> <dbl> <dbl>  <dbl>
    1  20.1     1    10  20.1    10 
    2  24.1     2    10  24.1   108.
    3  26.1     3    10  26.1  1100.
    4   1.1     1    20   1.1    20 
    5   2.1     2    20   2.1   410.
    6   4.1     3    20   4.1  8210.