Search code examples
pythonrpandastime-seriesforecasting

Expanding Window(time series forecasting) in R


Hi i am new to stackoverflow as well as R. I am currently taking a course on Machine Learning & Deep Learning using Rstudio & Python . In this course he is teaching Forecasting too, where he is using only Python to implement those code. In Feature Engineering part of Forecasting, he is implementing Expanding Window feature, which is part of Pandas. Can someone help me find this code in R.

The code he use in python is Feature['Expand_Max']=df['births'].expanding().max()

The dataset looks like this before running this code:

date births year month day lag1 lag2 Roll_mean Roll_max
1959-01-01 35 1959 1 1 NA NA NA NA
1959-01-02 32 1959 1 2 35 NA 33.5 NA
1959-01-03 30 1959 1 3 32 NA 31.0 35
1959-01-04 31 1959 1 4 30 NA 30.5 32
1959-01-05 44 1959 1 5 31 NA 37.5 44
1959-01-06 29 1959 1 6 44 NA 36.5 44
1959-01-07 45 1959 1 7 29 NA 37.0 45
1959-01-08 43 1959 1 8 45 NA 44.0 45
1959-01-09 38 1959 1 9 43 NA 40.5 45
1959-01-10 27 1959 1 10 38 NA 32.5 43

The dataset looks like this after running this code:

The code he use in python is Feature['Expand_Max']=df['births'].expanding().max()

date births year month day lag1 lag2 Roll_mean Roll_max Expand_Max
1959-01-01 35 1959 1 1 NA NA NA NA 35
1959-01-02 32 1959 1 2 35 NA 33.5 NA 35
1959-01-03 30 1959 1 3 32 NA 31.0 35 35
1959-01-04 31 1959 1 4 30 NA 30.5 32 35
1959-01-05 44 1959 1 5 31 NA 37.5 44 44
1959-01-06 29 1959 1 6 44 NA 36.5 44 44
1959-01-07 45 1959 1 7 29 NA 37.0 45 45
1959-01-08 43 1959 1 8 45 NA 44.0 45 45
1959-01-09 38 1959 1 9 43 NA 40.5 45 45
1959-01-10 27 1959 1 10 38 NA 32.5 43 45

Solution

  • You can use zoo::na.locf with fromLast = TRUE which will fill the NA values with the last non-NA value in the column, cummax would return cumulative maximum at every point.

    df$Roll_max <- cummax(zoo::na.locf(df$Roll_max, fromLast = TRUE))
    df$Roll_max
    #[1] 35 35 35 35 44 44 45 45 45 45