Search code examples
rtsibbletidyverts

How to control the fill_gaps interval in tsibble?


I have two data frames that fill missing in different intervals. I would like to fill the two to the same interval. Consider two data frames with the same month-day but two years apart:

library(tidyverse)
library(fpp3)

df_2020 <- tibble(month_day = as_date(c('2020-1-1','2020-2-1','2020-3-1')),
                 amount = c(5, 2, 1))

df_2022 <- tibble(month_day = as_date(c('2022-1-1','2022-2-1','2022-3-1')),
                 amount = c(5, 2, 1))

These data frames both have three rows, with the same dates, 2 years apart.

Create tsibbles with a yearweek index:

ts_2020 <- df_2020 |> mutate(year_week = yearweek(month_day)) |>
  as_tsibble(index = year_week)
 
ts_2022 <- df_2022 |> mutate(year_week = yearweek(month_day)) |>
  as_tsibble(index = year_week)

ts_2020

#> # A tsibble: 3 x 3 [4W]
#>   month_day  amount year_week
#>   <date>      <dbl>    <week>
#> 1 2020-01-01      5  2020 W01
#> 2 2020-02-01      2  2020 W05
#> 3 2020-03-01      1  2020 W09

ts_2022

#> # A tsibble: 3 x 3 [1W]
#>   month_day  amount year_week
#>   <date>      <dbl>    <week>
#> 1 2022-01-01      5  2021 W52
#> 2 2022-02-01      2  2022 W05
#> 3 2022-03-01      1  2022 W09

Still three rows in each tsibble

Now fill gaps:

ts_2020_filled <- ts_2020 |> fill_gaps()

ts_2022_filled <- ts_2022 |> fill_gaps()

ts_2020_filled

#> # A tsibble: 3 x 3 [4W]
#>   month_day  amount year_week
#>   <date>      <dbl>    <week>
#> 1 2020-01-01      5  2020 W01
#> 2 2020-02-01      2  2020 W05
#> 3 2020-03-01      1  2020 W09

ts_2022_filled

#> # A tsibble: 10 x 3 [1W]
#>    month_day  amount year_week
#>    <date>      <dbl>    <week>
#>  1 2022-01-01      5  2021 W52
#>  2 NA             NA  2022 W01
#>  3 NA             NA  2022 W02
#>  4 NA             NA  2022 W03
#>  5 NA             NA  2022 W04
#>  6 2022-02-01      2  2022 W05
#>  7 NA             NA  2022 W06
#>  8 NA             NA  2022 W07
#>  9 NA             NA  2022 W08
#> 10 2022-03-01      1  2022 W09

Here is the issue: ts_2020_filled has 4-weekly steps, and ts_2022_filled has 1-weekly steps. This is because the two tsibbles have different intervals:

tsibble::interval(ts_2020)

#> <interval[1]>
#> [1] 4W

tsibble::interval(ts_2022)

#> <interval[1]>
#> [1] 1W

This is because the tsibbles have different steps:

ts_2020 |>
  pluck("year_week") |>
  diff()

#> Time differences in weeks
#> [1] 4 4

ts_2022 |> 
  pluck("year_week") |>
  diff()

#> Time differences in weeks
#> [1] 5 4

Therefore, the greatest common divisors are different (4 and 1). From the manual for as_tibble:

regular Regular time interval (TRUE) or irregular (FALSE). The interval is determined by the greatest common divisor of index column, if TRUE.

Both tsibbles are regular:

is_regular(ts_2020)

#> [1] TRUE

is_regular(ts_2020)

#> [1] TRUE

So, I would like to set the gap fill interval, so the periods are consistent. I tried setting .full in fill_gaps and .regular in as_tsibble. I could not find a way to set the interval of a tsibble. Is there a way of manually setting the interval used by fill_gaps? Granted an interval of four weeks won't work for df_2022, but the LCM of one would work for both.


Solution

  • I think you're looking for the new_interval() function. A tsibble has an interval attribute that can be changed with new_interval().

    # set the interval of ts_2020 to 1 week to match ts_2022
    attr(ts_2020, 'interval') <- tsibble::new_interval(week = 1)
    
    ts_2020 |>
      tsibble::fill_gaps()
    #> # A tsibble: 9 x 3 [1W]
    #>   month_day  amount year_week
    #>   <date>      <dbl>    <week>
    #> 1 2020-01-01      5  2020 W01
    #> 2 NA             NA  2020 W02
    #> 3 NA             NA  2020 W03
    #> 4 NA             NA  2020 W04
    #> 5 2020-02-01      2  2020 W05
    #> 6 NA             NA  2020 W06
    #> 7 NA             NA  2020 W07
    #> 8 NA             NA  2020 W08
    #> 9 2020-03-01      1  2020 W09
    
    ts_2022 |>
      tsibble::fill_gaps()
    #> # A tsibble: 10 x 3 [1W]
    #>    month_day  amount year_week
    #>    <date>      <dbl>    <week>
    #>  1 2022-01-01      5  2021 W52
    #>  2 NA             NA  2022 W01
    #>  3 NA             NA  2022 W02
    #>  4 NA             NA  2022 W03
    #>  5 NA             NA  2022 W04
    #>  6 2022-02-01      2  2022 W05
    #>  7 NA             NA  2022 W06
    #>  8 NA             NA  2022 W07
    #>  9 NA             NA  2022 W08
    #> 10 2022-03-01      1  2022 W09
    

    Created on 2023-04-10 with reprex v2.0.2