I am struggeling to get maximum value of variable from last year of observations (Not each year!) and implement it to each row (observation).
I think the best way to do so is using the rollapply function but I cannot figure out how the width should look like since it may vary for each observation (each observation represents a day but not all days have observations). I know that using list will make offset values so how sould these values look like ?
The code I got:
mutate(data,"Feature"=rollapplyr(variable,list(0,"Go back one year"),max,fill=NA))
Example in order to clarify: a row has date of 31/8/2016. I want the new column (using mutate of dplyr package) to display in this row the maximum value of variable
from 31/8/2015 to 31/8/2016 (this row).
For those who want to go further:
Instead of displaying the variable
value - display TRUE
or FALSE
(or 1
/ 0
) when calculated maximum variable
is above threshold value.
Difficult to answer without further details. But see if this is what you need:
data=data.frame(Data=seq.Date(as.Date("2001-01-01"),as.Date("2005-12-31"),by = "month"),Var=sample(1:1000,60,TRUE))
#exclude some lines
data=data[-c(10,15,17:21),]
# using for
for (i in 1:nrow(data)){ # i=1
data$Max[i]=max(data[data$Data>(data$Data[i]-360) & data$Data<=data$Data[i],"Var"])
}
# using rollapply
# one year interval from dates
for (i in 1:nrow(data)){ # i=1
data$Oneyear[i]=length(data$Data[data$Data>(data$Data[i]-360) & data$Data<=data$Data[i]])
}
data$Maxr=rollapplyr(data$Var, data$Oneyear, max)
Using
set.seed(123)
you will get:
> tail(data)
Data Var Oneyear Max Maxr
55 2005-07-01 561 12 858 858
56 2005-08-01 207 12 858 858
57 2005-09-01 128 12 858 858
58 2005-10-01 754 12 858 858
59 2005-11-01 896 12 896 896
60 2005-12-01 375 12 896 896