Search code examples
rdplyrdata-manipulation

Grouped diminishing weighted cumsum


Suppose I have this data

df <- tibble(
  id = c(rep("ID100", 3), 
             rep("ID200", 3), 
             rep("ID450", 3)), 
  year = 2021, 
  month = c(rep(c(5, 6, 7), 3)), 
  value = 5 
)

# A tibble: 9 × 4
  id     year month value
  <chr> <dbl> <dbl> <dbl>
1 ID100  2021     5     5
2 ID100  2021     6     5
3 ID100  2021     7     5
4 ID200  2021     5     5
5 ID200  2021     6     5
6 ID200  2021     7     5
7 ID450  2021     5     5
8 ID450  2021     6     5
9 ID450  2021     7     5

I would like to mutate a new column with cumsum() of column value, but with weights.

The weights starts from 1 and reduces by its half, as such 1, 0.5, 0.25, 0.125 and so on

Desired output:

# A tibble: 9 × 5
  id     year month value acc_sum
  <chr> <dbl> <dbl> <dbl>   <dbl>
1 ID100  2021     5     5    1.25 # 5 * 0.25
2 ID100  2021     6     5    4    # 5 * 0.5 + previous value of 1.25
3 ID100  2021     7     5    9    # 5 * 1 + previous value of 4
4 ID200  2021     5     5    1.25
5 ID200  2021     6     5    4   
6 ID200  2021     7     5    9   
7 ID450  2021     5     5    1.25
8 ID450  2021     6     5    4   
9 ID450  2021     7     5    9 

I have tried the following, but I feel like cumsum or accumulate with some modification would do a better job without hardcoding the weights

df %>%
  group_by(id, year) %>%
  arrange(id, year, month) %>%
  mutate(weights = last(value) +
           lag(value) * 0.5 +
           lag(value, 2) * 0.25)

Solution

  • Maybe you want something like this where you multiply by the (row number)-(row number)*0.75 per group using data.table:

    library(tibble)
    df <- tibble(
      id = c(rep("ID100", 3), 
             rep("ID200", 3), 
             rep("ID450", 3)), 
      year = 2021, 
      month = c(rep(c(5, 6, 7), 3)), 
      value = 5 
    )
    library(data.table)
    setDT(df)[, acc_sum := cumsum(value*(seq_len(.N)-seq_len(.N)*0.75)), by = "id"]
    df
    #>       id year month value acc_sum
    #> 1: ID100 2021     5     5    1.25
    #> 2: ID100 2021     6     5    3.75
    #> 3: ID100 2021     7     5    7.50
    #> 4: ID200 2021     5     5    1.25
    #> 5: ID200 2021     6     5    3.75
    #> 6: ID200 2021     7     5    7.50
    #> 7: ID450 2021     5     5    1.25
    #> 8: ID450 2021     6     5    3.75
    #> 9: ID450 2021     7     5    7.50
    

    Created on 2022-07-17 by the reprex package (v2.0.1)