Hi i am new to stackoverflow as well as R. I am currently taking a course on Machine Learning & Deep Learning using Rstudio & Python . In this course he is teaching Forecasting too, where he is using only Python to implement those code. In Feature Engineering part of Forecasting, he is implementing Expanding Window feature, which is part of Pandas. Can someone help me find this code in R.
The code he use in python is Feature['Expand_Max']=df['births'].expanding().max()
The dataset looks like this before running this code:
date | births | year | month | day | lag1 | lag2 | Roll_mean | Roll_max |
---|---|---|---|---|---|---|---|---|
1959-01-01 | 35 | 1959 | 1 | 1 | NA | NA | NA | NA |
1959-01-02 | 32 | 1959 | 1 | 2 | 35 | NA | 33.5 | NA |
1959-01-03 | 30 | 1959 | 1 | 3 | 32 | NA | 31.0 | 35 |
1959-01-04 | 31 | 1959 | 1 | 4 | 30 | NA | 30.5 | 32 |
1959-01-05 | 44 | 1959 | 1 | 5 | 31 | NA | 37.5 | 44 |
1959-01-06 | 29 | 1959 | 1 | 6 | 44 | NA | 36.5 | 44 |
1959-01-07 | 45 | 1959 | 1 | 7 | 29 | NA | 37.0 | 45 |
1959-01-08 | 43 | 1959 | 1 | 8 | 45 | NA | 44.0 | 45 |
1959-01-09 | 38 | 1959 | 1 | 9 | 43 | NA | 40.5 | 45 |
1959-01-10 | 27 | 1959 | 1 | 10 | 38 | NA | 32.5 | 43 |
The dataset looks like this after running this code:
The code he use in python is Feature['Expand_Max']=df['births'].expanding().max()
date | births | year | month | day | lag1 | lag2 | Roll_mean | Roll_max | Expand_Max |
---|---|---|---|---|---|---|---|---|---|
1959-01-01 | 35 | 1959 | 1 | 1 | NA | NA | NA | NA | 35 |
1959-01-02 | 32 | 1959 | 1 | 2 | 35 | NA | 33.5 | NA | 35 |
1959-01-03 | 30 | 1959 | 1 | 3 | 32 | NA | 31.0 | 35 | 35 |
1959-01-04 | 31 | 1959 | 1 | 4 | 30 | NA | 30.5 | 32 | 35 |
1959-01-05 | 44 | 1959 | 1 | 5 | 31 | NA | 37.5 | 44 | 44 |
1959-01-06 | 29 | 1959 | 1 | 6 | 44 | NA | 36.5 | 44 | 44 |
1959-01-07 | 45 | 1959 | 1 | 7 | 29 | NA | 37.0 | 45 | 45 |
1959-01-08 | 43 | 1959 | 1 | 8 | 45 | NA | 44.0 | 45 | 45 |
1959-01-09 | 38 | 1959 | 1 | 9 | 43 | NA | 40.5 | 45 | 45 |
1959-01-10 | 27 | 1959 | 1 | 10 | 38 | NA | 32.5 | 43 | 45 |
You can use zoo::na.locf
with fromLast = TRUE
which will fill the NA
values with the last non-NA value in the column, cummax
would return cumulative maximum at every point.
df$Roll_max <- cummax(zoo::na.locf(df$Roll_max, fromLast = TRUE))
df$Roll_max
#[1] 35 35 35 35 44 44 45 45 45 45