Search code examples
rfactors

R: Create a factor variable based on generic rolling deciles as dataset grows


I can create a factor variable for the deciles of my data using the code below, which takes into account the whole history:

`q <- quantile(x, seq(0,1,0.1), na.rm = TRUE)
decilab <- c("1st","2nd","3rd","4th","5th","6th","7th","8th","9th","10th")
q.factor <- cut(x, unique(q), included.lowest = TRUE, labels = decilab)`

However, I need to make a generic cut into deciles on a rolling basis, only accounting for the history that is prior to the point being labeled. This code below uses a for loop to calculate rolling quantile as 9 distinct variables, but I'm not sure how to translate that into a single factor variable (nor do I particularly want/need these variables to exist).

`for(i in 1:length(x)){
   D1[i] <- quantile(x[1:i],0.1, na.rm = TRUE)
   D2[i] <- quantile(x[1:i],0.2, na.rm = TRUE)
   D3[i] <- quantile(x[1:i],0.3, na.rm = TRUE)
   D4[i] <- quantile(x[1:i],0.4, na.rm = TRUE)
   D5[i] <- quantile(x[1:i],0.5, na.rm = TRUE)
   D6[i] <- quantile(x[1:i],0.6, na.rm = TRUE)
   D7[i] <- quantile(x[1:i],0.7, na.rm = TRUE)
   D8[i] <- quantile(x[1:i],0.8, na.rm = TRUE)
   D9[i] <- quantile(x[1:i],0.9, na.rm = TRUE)
}`

There has to be a better way! Thank you for your help, and my apologies if this is a common problem - I haven't found anything so far.

Edit: Apologies as I am new to Stack Overflow and R. I think I have a better example, but I'm not sure how to resubmit this question.

Suppose you have the vector x <- 1:1000 the goal is to cut this data into deciles with cut(x, seq(0,1,0.1), include.lowest = TRUE) however this would cut the whole series x into groups that bucket [0,100), [100, 200)...etc however, my goal is that the bucketing is variable, based on only the preceding data, not the whole vector. So essentially, each and every single point would be in the "top decile" because this series is linear, however for a stochastic series the decile of the latest point is only determined relative to proceeding points, not the whole series.

I tried the following:

`for (i in 1:length(x)){
    z[i] <- as.numeric(cut(x[1:i], quantile(x[1:i], seq(0,1,.1))))[i]
 } `

However that doesn't work


Solution

  • `library(dplyr)
    x <- 1:1000
    y<-vector(mode="numeric",length=0)
    for (i in 1:length(x)){
       y[i]<-last(ntile(x[1:i],10))
    }`
    

    This appears to work!

    Advice from a colleague was that dplyr::ntile was superior to cut