Search code examples
rmedian

Rolling Median with subset building over time


I would like to compute a variant of rolling medians on my dataset that does build the subsets not by going k observerations to the front and back, but by taking all observations into account that are in a given time window.

A straightforward implemtation could look like this:

windowwidth <- 30
median.window <- function(x) median(mydata[time <= x + windowwidth /2 & time >= x - windowwidth /2)
vapply(time, median.window)

However, as you can imagine, this is not very efficient for large datasets. Do you see a possible improvement or a package providing an optimized implementation? You can not expect the observations be distributed equally over time.

zoo provides rollmedian, but this function does not offer to choose the winwod based on time but on the observation count.


Solution

  • Ok, try this:

    Rgames: timeseq<-1:5 
    Rgames: winmat <- outer(timeseq,timeseq,FUN=function(x,y) y>=x &y<=x+2) 
    Rgames: winmat 
          [,1]  [,2]  [,3]  [,4]  [,5] 
    [1,]  TRUE  TRUE  TRUE FALSE FALSE 
    [2,] FALSE  TRUE  TRUE  TRUE FALSE 
    [3,] FALSE FALSE  TRUE  TRUE  TRUE 
    [4,] FALSE FALSE FALSE  TRUE  TRUE 
    [5,] FALSE FALSE FALSE FALSE  TRUE 
    Rgames: winmat %*% timeseq 
         [,1] 
    [1,]    6 
    [2,]    9 
    [3,]   12 
    [4,]    9 
    [5,]    5 
    

    Replace that function with your window width and I think you'll be all set.
    Edit: In respons to Thilo's query, it looks like in the general case you should use apply. Given the stuff above, call your observation values "timval", as

    Rgames: timval<-c(3,4,2,6,1)
    Rgames: valmat<-timval*t(winmat)
    Rgames: valmat
         [,1] [,2] [,3] [,4] [,5]
    [1,]    3    0    0    0    0
    [2,]    4    4    0    0    0
    [3,]    2    2    2    0    0
    [4,]    0    6    6    6    0
    [5,]    0    0    1    1    1
    Rgames: apply(valmat,2,median)
    [1] 2 2 1 0 0
    

    Edit again: clearly I was asleep there: nobody wants a median based on all those zeroes. I should think more before posting. Add this:

    valmat[valmat==0]<- NA
    apply(valmat,2, median, na.rm=T)
    [1] 3.0 4.0 2.0 3.5 1.0
    

    And I'm sure there's a cleaner way of 'building' valmat than this, but the final result is the "filter matrix" you want to apply any function to.