I've been using gregmisc library to perform a rolling decile ranking.
Let's say I have vector 'X' of 1000 continuous value and I apply my function with a look back window of 250 (which is what I use).
My current function works as follows: The first 250 records will be values between 1 & 10. Then the next record 251, will be determined by the values from c(2:251), then repeats for c(3:252), etc...
While it does the trick faster than a loop, the performance of using gregmisc's "running" function for my decile ranking function has much to be desired.
I've been working on speeding up my functions by operating over the entire time series by creating basically columns of information that I would need at that time but I have not come up with a similar solution for this problem like I have for others. When I used this method, I've reduced processing time by as much as 95%.
Matrices may work more quickly but I haven't seen it done well enough to beat my running version.
Any ideas?
Thanks!
Here is the code I'm using: 1 core function then a function that uses rolling from Greg misc:
F_getDecileVal <- function( x, deciles=0.1) {
len<-length(x)
y <- array(0,dim=len)
deciles <- seq(0,1,deciles)
decileBounds <- quantile( x ,deciles, na.rm=TRUE)
lendecile <- length(decileBounds)
for( i in 2 : lendecile) {
y[ which( x <= decileBounds[[i]] & x >= decileBounds[[i-1]] ) ] <- (i - 1)
}
#Reverse Order so top decile has largest values
dec6 <- which(y==6); dec7 <- which(y==7); dec8 <- which(y==8); dec9 <- which(y==9); dec10 <-which(y==10);
dec1 <- which(y==1); dec2 <- which(y==2); dec3 <- which(y==3); dec4 <- which(y==4); dec5 <-which(y==5);
y[dec1]<-10; y[dec2]<-9; y[dec3]<-8; y[dec4]<-7; y[dec5]<-6; y[dec6]<-5; y[dec7]<-4; y[dec8]<-3; y[dec8]<-3; y[dec9]<-2; y[dec10]<-1;
return(y)
}
And the rolling function:
F_getDecileVal_running <- function(x, decilecut=0.1,interval){
len<-length(x)
#Modified by ML 5/4/2013
y <- array(NA, dim=len)
if(len >= interval){
y <- running(x, fun=F_getDecileVal, width=interval,records=1, pad=TRUE,simplify=TRUE)
y[1:interval] <- F_getDecileVal(x[1:interval])
}
return(y)
}
# system.time(F_getDecileVal_running(mydata[,8],interval=250))
# > dim(mydata)
# [1] 5677 9
#user system elapsed
# 4.28 0.00 4.38
If you can accept using a version of 'decile' that is not the one used by default in R's quantile function (but is one of the possible choices I think type=6), then you can probably just use sort
and extract the 26th, 51st, 76th, ... etc to either 226th or 250th items depending on whether you also want the min and max vs. just the inner decile "hinges". The rollapply
function in the zoo-package is designed for rolling function application and I think will probably be more useful in the long run than gregmisc::running
since it is part of a suite of functions for time series. This more minimal example returns just min, max and median for a simple set:
x <- 1:1000
require(zoo)
rollapply(x[1:300], 250, function(x) sort(x)[ c(1, 125, 250) ] )
[,1] [,2] [,3]
[1,] 1 125 250
[2,] 2 126 251
[3,] 3 127 252
[4,] 4 128 253
[5,] 5 129 254
[6,] 6 130 255
[7,] 7 131 256
snipped the rest of the 50 lines of the output matrix.