Search code examples
rdplyrinterpolationextrapolation

How to extrapolate values over dates using R


I have data for the first 7 months of a year, and would like to linearly extrapolate the data for the last 5 months. I have tried using the approxExtrap function from the Hmisc package but am not sure how to employ it, particularly what to specify for xout. I'm open to any solution (dplyr-based would be ideal if possible). Thank you for your time.

Here is a sample of the data:


df <- tibble(pop = c(18968, 18956, 18946, 18934, 18923, 18912, 18901, NA, NA, NA, NA, NA),
                       date = c("2020-01-01", "2020-02-01", "2020-03-01", 
                                "2020-04-01", "2020-05-01", "2020-06-01", 
                                "2020-07-01", "2020-08-01", "2020-09-01", 
                                "2020-10-01", "2020-11-01", "2020-12-01"))
df$date <- lubridate::as_date(df$date)


Solution

  • Run lm and predict and then use coalesce to combine the known and predicted values.

    library(dplyr)
    df %>%
      mutate(pop2 = coalesce(pop, predict(lm(pop ~ date), across(date))))
    

    giving the following where pop2 is pop with the NA's filled in with predicted values.

    # A tibble: 12 × 3
         pop date         pop2
       <dbl> <date>      <dbl>
     1 18968 2020-01-01 18968 
     2 18956 2020-02-01 18956 
     3 18946 2020-03-01 18946 
     4 18934 2020-04-01 18934 
     5 18923 2020-05-01 18923 
     6 18912 2020-06-01 18912 
     7 18901 2020-07-01 18901 
     8    NA 2020-08-01 18889.
     9    NA 2020-09-01 18878.
    10    NA 2020-10-01 18867.
    11    NA 2020-11-01 18856.
    12    NA 2020-12-01 18845.