Search code examples
rdplyrpurrraccumulate

Backcasting series with year-on-year growth rate


I have a dataset containing GDP data and year-on-year growth rates. The growth rate series begins earlier than the GDP data, so I want to use them to recursively backcast the GDP series to extend its sample size.

Each missing value (e.g. 2021:Q4) should equal the value in the same quarter one year ahead (2022:Q4) divided by the year-on-year growth rate in that future period (i.e. x_t = x_{t+4} / g_{t+4}). Since each new value depends on a "future" value, I need a way to calculate this recursively.

I have created the desired output using a loop, but would prefer a solution which uses the purrr::accumulate function within a dplyr workflow.

library(dplyr)

# Dummy data set
set.seed(123)
df <- data.frame(date = seq(as.Date("2018/1/1"), by = "quarter", length.out = 20),
                 gdp = c(rep(NA, 16), cumprod(runif(4, 0.95, 1.05)) * 1000),
                 growth_rate = rnorm(5, mean = 3, sd = 1)) %>% 
  mutate(growth_rate = 1 + growth_rate/100) %>% 
  arrange(desc(date)) # Sorting the data descending to make the loop work

# for loop which gets a solution
for (i in 5:length(df$growth_rate)){
  df[i, "gdp"] <- df[i-4, "gdp"] / df[i-4, "growth_rate"]
}

Solution

  • In your example, you are treating the time series for each quarter separately, each with a different initial value from which you would like to extrapolate.

    However, accumulate() only accepts a single initial value as its .init argument. So you will have to split (or group) your data by month, calculate the GDP values based on the corresponding initial value, and finally combine the results.

    Below is an example of accumulate() for a single quarter to build intuition:

    # single quarter
    df1 <- df |> filter(lubridate::month(date) == 10)
    
    accumulate(df1$growth_rate,
               `/`,
               .init = df1$gdp[1])
    #> [1] 1036.0279 1001.3713  957.7120  929.1814  900.9869  860.4177
    

    Created on 2023-03-21 with reprex v2.0.2

    The entire workflow with a grouped dataframe could then look like this:

    library(lubridate)
    library(tidyverse)
    
    set.seed(123)
    df <- data.frame(date = seq(as.Date("2018/1/1"), by = "quarter", length.out = 20),
                     gdp = c(rep(NA, 16), cumprod(runif(4, 0.95, 1.05)) * 1000),
                     growth_rate = rnorm(5, mean = 3, sd = 1)) %>% 
      mutate(growth_rate = 1 + growth_rate/100) %>% 
      arrange(desc(date)) # Sorting the data descending to make the loop work
    
    df |>
      group_by(month = month(date)) |>
      mutate(new = accumulate(
        head(growth_rate, -1), `/`, .init = nth(gdp, 1)
      ))
    #> # A tibble: 20 × 5
    #> # Groups:   month [4]
    #>    date         gdp growth_rate month   new
    #>    <date>     <dbl>       <dbl> <dbl> <dbl>
    #>  1 2022-10-01 1036.        1.03    10 1036.
    #>  2 2022-07-01  998.        1.05     7  998.
    #>  3 2022-04-01 1007.        1.03     4 1007.
    #>  4 2022-01-01  979.        1.03     1  979.
    #>  5 2021-10-01   NA         1.05    10 1001.
    #>  6 2021-07-01   NA         1.03     7  953.
    #>  7 2021-04-01   NA         1.05     4  976.
    #>  8 2021-01-01   NA         1.03     1  950.
    #>  9 2020-10-01   NA         1.03    10  958.
    #> 10 2020-07-01   NA         1.05     7  921.
    #> 11 2020-04-01   NA         1.03     4  932.
    #> 12 2020-01-01   NA         1.05     1  921.
    #> 13 2019-10-01   NA         1.03    10  929.
    #> 14 2019-07-01   NA         1.03     7  881.
    #> 15 2019-04-01   NA         1.05     4  901.
    #> 16 2019-01-01   NA         1.03     1  879.
    #> 17 2018-10-01   NA         1.05    10  901.
    #> 18 2018-07-01   NA         1.03     7  855.
    #> 19 2018-04-01   NA         1.03     4  862.
    #> 20 2018-01-01   NA         1.05     1  850.
    

    Created on 2023-03-21 with reprex v2.0.2

    Of course, accumulate() is not needed here because there already is a function to compute the cumulative product (cumprod()) in base R:

    df |>
      group_by(month = month(date)) |>
      mutate(new = cumprod(c(nth(gdp, 1), head(growth_rate, -1)^(-1))))