Search code examples
rdplyrdata.tableminimum

How to find position of running minimum (runMin) in a vector in R?


As an example I have a vector v of length 300 with stock price data for 300 days:

v = c(seq(20,30,length.out=60), seq(29.8,12.8, length.out=45), 
seq(13,18.4, length.out=40),seq(18.6,5.04, length.out=20), 
seq(5,9.5, length.out=33), seq(9.7,5.9, length.out=33), 
seq(6,12,length.out=50), seq(12.1,2, length.out=19))

For EACH day in v I find the minimum price for a certain range (in my example 50-100 days ago) which I accomplish using lag (from dplyr) and runMin from TTR-package:

w = TTR::runMin(dplyr::lag(v,51),50)

What I am interested in: for each day in the data, how many days ago was the minimum? I.e. for each position in vector v, how much time has passed since the minimum for the range 50-100 days ago (as stored in w)?

I came across an interesting answer here, but I couldn't apply it to a running minimum. An efficient solution using dplyr or data.table would be ideal, as it shall be applied to a data set containing millions of data points.


Solution

  • As mentioned in my comment, one approach on you simple vector v would be:

    rnge <- c(50, 100)
    
    want <- c(rep(NA, rnge[2]), 
      sapply((rnge[2]+1):length(v), \(x) 
             (rnge[2]+1) - which.min(v[(x-rnge[2]):(x-(rnge[1]+1))]))
      )
    

    As you mentioned in your comment:

    it seems to produce the correct output. Thank you for the suggestion. What worries me a little: I have to apply this to a data.table with >4 million observations comprising about 2000 different groups - each group has to be assessed. Do you have an idea how to wrap this code up nicely (e.g. dplyr or data.table) to be applied to several different groups.

    You can adapt this to a data table in a similar way. Since the sample data are not a data table, here are some sample data with 5 groups:

    library(data.table)
    dt <- data.table(
      group_id = rep(1:5, each=length(v)), 
      time = rep(1:length(v), 5),  
      price = rep(v, 5)  
    )
    

    To adapt the sapply(...) code above to data table:

    dt[, days_ago := {
      res <- rep(NA_real_, .N)
      if (.N > 100) {
        res[(rnge[2]+1):.N] <- sapply((rnge[2]+1):.N, \(x) (rnge[2]+1) - which.min(price[(x-rnge[2]):(x-(rnge[1]+1))]))
      }
      res
    }, by = group_id]
    

    Note others here on SO are much more proficient in data table than I am, so this may not be the most optimized approach!