Search code examples
rtimedplyrzoomedian

Calculate median value over the past X months, for each row and ID in R


I need to create a new column with the median value of the past 6 months (180 days) for each ID, considering the actual value. If there is no information or the previous record is > 6 months, the median value must be the value of that row.

Input Data

I have this:

structure(list(id = c(1, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4), value = c(956, 
986, 995, 995, 986, 700, 600, 995, 956, 1000, 986), date = structure(c(15601, 
17075, 10965, 11068, 11243, 14610, 15248, 15342, 15344, 15380, 
16079), class = "Date")), .Names = c("id", "value", "date"), row.names = c(NA, -11L), class = "data.frame")

What I have to achieve is this:

structure(list(id = c(1, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4), value = c(956, 
986, 995, 995, 986, 700, 600, 995, 956, 1000, 986), date = structure(c(15601, 
17075, 10965, 11068, 11243, 14610, 15248, 15342, 15344, 15380, 
16079), class = "Date"), median = c(956,986,995,995,990,700,600,797.5,956,975.5, 986)), .Names = c("id", "value", "date", "median"), row.names = c(NA, -11L), class = "data.frame")

I have tried to use rollaplyr and rollmeadian from zoo package to follow the answers provide in this post Finding Cumulative Sum In R Using Conditions

But I can't come to the good result.

Thank you in advanced


Solution

  • Try this solution:

    Split the data.frame by id using function split:

    list_df<-split(df,f=df$id)
    

    Function providing the median on a single id value with date condition:

    f_median<-function(i,db)
    {
      return(median(db[as.POSIXct(db[,"date"])>=as.POSIXct(db[i,"date"]-180) & as.POSIXct(db[,"date"])<=as.POSIXct(db[i,"date"]),"value"]))
    }
    

    Iteration over splitted data.frame:

    f<-function(db)
    {
       return(sapply(rep(1:nrow(db)),f_median,db))
    }
    

    Your desired output

     median<-unlist(lapply(list_df,f))
     cbind(df,median)
       id value       date median
    1   1   956 2012-09-18  956.0
    2   2   986 2016-10-01  986.0
    31  3   995 2000-01-09  995.0
    32  3   995 2000-04-21  995.0
    33  3   986 2000-10-13  990.5
    41  4   700 2010-01-01  700.0
    42  4   600 2011-10-01  600.0
    43  4   995 2012-01-03  797.5
    44  4   956 2012-01-05  956.0
    45  4  1000 2012-02-10  975.5
    46  4   986 2014-01-09  986.0