I have a dataset like below:
Date Metric1 Metric2 Metric3 Metric4
2017-01-01 NA 3 NA 7
2017-01-02 NA 4 NA 10
2017-01-03 NA 2 NA 18
2017-01-04 5 8 NA 20
2017-01-05 8 9 87 34
2017-01-06 10 2 45 12
. . . . .
. . . . .
. . . . .
2018-09-01 12 13 14 15
2018-09-02 34 12 28 19
2018-09-03 45 12 45 34
2018-09-04 NA 14 49 11
2018-09-05 NA 11 90 12
2018-09-06 NA 15 NA 32
2018-09-07 NA 23 NA 43
2018-09-08 NA 12 NA 22
My dataset has 100 columns. There are no missing values in between the NA
s in their respective columns. Does anyone know a package or a function that will forecast or use a moving average for the values before and after the first or last numeric value?
I have done some research on this so far and the best I can find is na.fill
but that will just repeat values at the beginning and end of columns.
You can use the imputeTS
package to impute the missing values. For moving average you can do something like:
library(imputeTS)
ts_df[,2:5] <- apply(ts_df[,2:5], 2, na_ma, k = 6) # k = width of moving average
ts_df
Date Metric1 Metric2 Metric3 Metric4
1 2017-01-01 6.933333 3 64.57143 7
2 2017-01-02 7.806452 4 62.13333 10
3 2017-01-03 8.396825 2 61.58065 18
4 2017-01-04 5.000000 8 61.38095 20
5 2017-01-05 8.000000 9 87.00000 34
6 2017-01-06 10.000000 2 45.00000 12
7 2018-09-01 12.000000 13 14.00000 15
8 2018-09-02 34.000000 12 28.00000 19
9 2018-09-03 45.000000 12 45.00000 34
10 2018-09-04 33.984127 14 49.00000 11
11 2018-09-05 34.451613 11 90.00000 12
12 2018-09-06 35.333333 15 66.80952 32
13 2018-09-07 37.142857 23 67.16129 43
14 2018-09-08 41.333333 12 68.93333 22
Refer R documentation for more time series related imputation techniques in imputeTS package.
Data:
ts_df <- read.table(text = " Date Metric1 Metric2 Metric3 Metric4
2017-01-01 NA 3 NA 7
2017-01-02 NA 4 NA 10
2017-01-03 NA 2 NA 18
2017-01-04 5 8 NA 20
2017-01-05 8 9 87 34
2017-01-06 10 2 45 12
2018-09-01 12 13 14 15
2018-09-02 34 12 28 19
2018-09-03 45 12 45 34
2018-09-04 NA 14 49 11
2018-09-05 NA 11 90 12
2018-09-06 NA 15 NA 32
2018-09-07 NA 23 NA 43
2018-09-08 NA 12 NA 22" , header = T, colClasses = c("Date" = "Date"))