I want to calculate ratios of certain variables for every id
For example, if i have 3 periods, 2 ids and certain measure, as in this df
df <- data.frame(
date = c(202001L, 202002L, 202003L, 202001L, 202002L, 202003L),
id = c("a", "a", "a", "b", "b", "b"),
value = c(23L, 43L, 123L, 56L, 23L, 13L))
I should be able to just to this
df_mod <- df %>%
group_by(id) %>%
mutate(value_var_1 = (value - dplyr::lag(value, 1))/dplyr::lag(value, 1)) %>%
ungroup()
But if, for example, id "a" is missing date = '202002', would mean the lag would be made between 202001 and 202003, that would no longer be a lag of 1, as in this example
df <- data.frame(
date = c(202001L, 202003L, 202001L, 202002L, 202003L),
id = c("a", "a", "b", "b", "b"),
value = c(23L, 123L, 56L, 23L, 13L)
)
df_mod <- df %>%
group_by(id) %>%
mutate(value_var_1 = (value - dplyr::lag(value, 1))/dplyr::lag(value, 1)) %>%
ungroup()
How can i make sure that id "a" has the 3 periods I need to make the calculations? How can i fill the missing dates with the last value?
You can use tidyr::complete
to complete the missing combinations and the perform the calculations for each id
:
library(dplyr)
df %>%
tidyr::complete(id, date = unique(date)) %>%
group_by(id) %>%
mutate(value_var_1 = (value - lag(value))/lag(value)) %>%
ungroup()
If every id
has different dates safer would be to convert to date class, create a sequence of monthly dates for each id
.
df %>%
mutate(date = as.Date(paste0(date, 1), '%Y%m%d')) %>%
arrange(id, date) %>%
group_by(id) %>%
tidyr::complete(date = seq(min(date), max(date), by = 'month')) %>%
mutate(value_var_1 = (value - lag(value))/lag(value)) %>%
ungroup()