I have an R dataset with columns:
date
country_code
version
money
The variable money
is numerical. The variables version
and country_code
are categorical. Each combination of date
, country_code
and version
defines a unique observation for which the value of money
is meassured.
I want to group the variables version
and country_code
and compute 3 new variables in the dataset which are:
money
along the previous 7 days.money_15
: For each observation, this variable stores the aggregation of the money
along the previous 15 daysmoney_total
: For each observation, this variable stores the aggregation of the money
up to the current date.The different combinations of country_code
and version
may have different start and end dates, and there also may missing dates.
data = tibble(
date = seq(as.Date('2022-01-01'), as.Date('2022-01-13'), by=1),
money = 1:13)
data = data[-c(2,3,4,10,11),]
# [Some computations for adding the expected new variables]
# A tibble: 10 × 2
date money money_week
2022-01-01 1 1
2022-01-05 5 6
2022-01-06 6 12
2022-01-07 7 19
2022-01-08 8 26
2022-01-09 9 35
2022-01-12 12 42
2022-01-13 13 49
Observe that each observation is not simply the sum of the previous 7 observations, but the sum of the previous 7 days. So for example, the observation on date 2022-01-08
is the sum of observation on dates 2022-01-08
, 2022-01-07
, 2022-01-06
, 2022-01-05
, and it would include observations 2022-01-04
, 2022-01-03
, 2022-01-02
but those last are missing.
You can use your date
variable as an index for a sliding window function.
library(slider)
library(tidyverse)
data %>%
mutate(money_week = slide_index_dbl(money, date, sum, .before = days(6)))
#> # A tibble: 8 × 3
#> date money money_week
#> <date> <int> <dbl>
#> 1 2022-01-01 1 1
#> 2 2022-01-05 5 6
#> 3 2022-01-06 6 12
#> 4 2022-01-07 7 19
#> 5 2022-01-08 8 26
#> 6 2022-01-09 9 35
#> 7 2022-01-12 12 42
#> 8 2022-01-13 13 49
money_15
set.seed(123)
newdata = tibble(
date = seq(as.Date('2022-01-01'), as.Date('2022-03-31'), by=1),
money = sample(1:20,90, replace = TRUE)) %>%
slice_sample(n = 60) %>%
arrange(date)
newdata %>%
mutate(money_week = slide_index_dbl(money, date, sum, .before = days(6)),
money_15 = slide_index_dbl(money, date, sum, .before = days(14)),
money_total = cumsum(money))
#> # A tibble: 60 × 5
#> date money money_week money_15 money_total
#> <date> <int> <dbl> <dbl> <int>
#> 1 2022-01-01 15 15 15 15
#> 2 2022-01-02 19 34 34 34
#> 3 2022-01-03 14 48 48 48
#> 4 2022-01-05 10 58 58 58
#> 5 2022-01-06 18 76 76 76
#> 6 2022-01-07 11 87 87 87
#> 7 2022-01-08 5 77 92 92
#> 8 2022-01-09 20 78 112 112
#> 9 2022-01-11 5 69 117 117
#> 10 2022-01-13 9 50 126 126
#> # ℹ 50 more rows