I need to create a variable that counts the number of observations that have occurred in the last 30 days for each id.
For example, imagine an observation that occurs on 1/2/2021 (d / m / y) for the id "a". If this observation is the first between 1/1/2021 and 1/2/2021 for the id "a" the variable must give 1. If it is the second, 2, etc.
Here is a larger example:
dat <- tibble::tribble(
~id, ~q, ~date,
"a", 1, "01/01/2021",
"a", 1, "01/01/2021",
"a", 1, "21/01/2021",
"a", 1, "21/01/2021",
"a", 1, "12/02/2021",
"a", 1, "12/02/2021",
"a", 1, "12/02/2021",
"a", 1, "12/02/2021",
"b", 1, "02/02/2021",
"b", 1, "02/02/2021",
"b", 1, "22/02/2021",
"b", 1, "22/02/2021",
"b", 1, "13/03/2021",
"b", 1, "13/03/2021",
"b", 1, "13/03/2021",
"b", 1, "13/03/2021")
dat$date <- lubridate::dmy(dat$date)
The result should be:
id q date newvar
a 1 01/01/2021 1
a 1 01/01/2021 2
a 1 21/01/2021 3
a 1 21/01/2021 4
a 1 12/02/2021 3
a 1 12/02/2021 4
a 1 12/02/2021 5
a 1 12/02/2021 6
b 1 02/02/2021 1
b 1 02/02/2021 2
b 1 22/02/2021 3
b 1 22/02/2021 4
b 1 13/03/2021 3
b 1 13/03/2021 4
b 1 13/03/2021 5
b 1 13/03/2021 6
Thank you very much.
With sapply
and between
, count the number of observations prior to the current observation that are within 30 days.
library(lubridate)
library(dplyr)
dat %>%
group_by(id) %>%
mutate(newvar = sapply(seq(length(date)),
function(x) sum(between(date[1:x], date[x] - days(30), date[x]))))
# A tibble: 16 x 4
# Groups: id [2]
id q date newvar
<chr> <dbl> <date> <int>
1 a 1 2021-01-01 1
2 a 1 2021-01-01 2
3 a 1 2021-01-21 3
4 a 1 2021-01-21 4
5 a 1 2021-02-12 3
6 a 1 2021-02-12 4
7 a 1 2021-02-12 5
8 a 1 2021-02-12 6
9 b 1 2021-02-02 1
10 b 1 2021-02-02 2
11 b 1 2021-02-22 3
12 b 1 2021-02-22 4
13 b 1 2021-03-13 3
14 b 1 2021-03-13 4
15 b 1 2021-03-13 5
16 b 1 2021-03-13 6