I have data for the first 7 months of a year, and would like to linearly extrapolate the data for the last 5 months. I have tried using the approxExtrap function from the Hmisc package but am not sure how to employ it, particularly what to specify for xout. I'm open to any solution (dplyr-based would be ideal if possible). Thank you for your time.
Here is a sample of the data:
df <- tibble(pop = c(18968, 18956, 18946, 18934, 18923, 18912, 18901, NA, NA, NA, NA, NA),
date = c("2020-01-01", "2020-02-01", "2020-03-01",
"2020-04-01", "2020-05-01", "2020-06-01",
"2020-07-01", "2020-08-01", "2020-09-01",
"2020-10-01", "2020-11-01", "2020-12-01"))
df$date <- lubridate::as_date(df$date)
Run lm
and predict
and then use coalesce
to combine the known and predicted values.
library(dplyr)
df %>%
mutate(pop2 = coalesce(pop, predict(lm(pop ~ date), across(date))))
giving the following where pop2
is pop
with the NA's filled in with predicted values.
# A tibble: 12 × 3
pop date pop2
<dbl> <date> <dbl>
1 18968 2020-01-01 18968
2 18956 2020-02-01 18956
3 18946 2020-03-01 18946
4 18934 2020-04-01 18934
5 18923 2020-05-01 18923
6 18912 2020-06-01 18912
7 18901 2020-07-01 18901
8 NA 2020-08-01 18889.
9 NA 2020-09-01 18878.
10 NA 2020-10-01 18867.
11 NA 2020-11-01 18856.
12 NA 2020-12-01 18845.