I need to create a new column with the median value of the past 6 months (180 days) for each ID, considering the actual value. If there is no information or the previous record is > 6 months, the median value must be the value of that row.
Input Data
I have this:
structure(list(id = c(1, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4), value = c(956,
986, 995, 995, 986, 700, 600, 995, 956, 1000, 986), date = structure(c(15601,
17075, 10965, 11068, 11243, 14610, 15248, 15342, 15344, 15380,
16079), class = "Date")), .Names = c("id", "value", "date"), row.names = c(NA, -11L), class = "data.frame")
What I have to achieve is this:
structure(list(id = c(1, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4), value = c(956,
986, 995, 995, 986, 700, 600, 995, 956, 1000, 986), date = structure(c(15601,
17075, 10965, 11068, 11243, 14610, 15248, 15342, 15344, 15380,
16079), class = "Date"), median = c(956,986,995,995,990,700,600,797.5,956,975.5, 986)), .Names = c("id", "value", "date", "median"), row.names = c(NA, -11L), class = "data.frame")
I have tried to use rollaplyr and rollmeadian from zoo package to follow the answers provide in this post Finding Cumulative Sum In R Using Conditions
But I can't come to the good result.
Thank you in advanced
Try this solution:
Split the data.frame
by id
using function split
:
list_df<-split(df,f=df$id)
Function providing the median on a single id
value with date
condition:
f_median<-function(i,db)
{
return(median(db[as.POSIXct(db[,"date"])>=as.POSIXct(db[i,"date"]-180) & as.POSIXct(db[,"date"])<=as.POSIXct(db[i,"date"]),"value"]))
}
Iteration over splitted data.frame:
f<-function(db)
{
return(sapply(rep(1:nrow(db)),f_median,db))
}
Your desired output
median<-unlist(lapply(list_df,f))
cbind(df,median)
id value date median
1 1 956 2012-09-18 956.0
2 2 986 2016-10-01 986.0
31 3 995 2000-01-09 995.0
32 3 995 2000-04-21 995.0
33 3 986 2000-10-13 990.5
41 4 700 2010-01-01 700.0
42 4 600 2011-10-01 600.0
43 4 995 2012-01-03 797.5
44 4 956 2012-01-05 956.0
45 4 1000 2012-02-10 975.5
46 4 986 2014-01-09 986.0