Search code examples
rcycle

Maximum Intermediate Volatility


I have two vectors, a and b. See attached.

a is the signal and is a probability. b is the absolute percentage change the next period.

Signalt <- seq(0, 1, 0.05)

I would like to find the maximum absolute return occuring within each intermediate 5%-tile (Signalt) of the a vector. So if it is

  0.01, 0.02, 0.03, 0.06 0.07 

then it should calculate the maximum return between

     0.01 and 0.02, 
     0.01 and 0.03, 
     0.02 and 0.03. 

Then move on to

     0.06 and 0.07 do it over etc. 

Output would then be combined in a matrix or table when the entire sequence has run.

It should follow the index from vector a and b.

i is an index that is updated by one every time that a crosses into a new percentile. t(i) is the bucket associated with the ith cross.

a is the probability vector which has length tao. This vector should be analyzed in its 5% tiles, with the maximum intermediate absolute return being the output. The price change of next period is the vector b. This would be represented by P in the equation below. l and m are indexes.

Every time Signal moves from one 5% tile to another, we compute the largest absolute return that occurs between any two intermediate buckets, until Signal moves to another 5% tile. For example, suppose that Signal moves into the 85th percentile and 4 volume buckets later moves into the 90th percentile. We would then calculate absolute returns between buckets 1 and 2, 1 and 3, 1 and 4, 2 and 3, 2 and 4, 3 and 4. We are interested in the maximum absolute return. We would then calculate the max return in the following percentile bucket, move on to the next, which could be an 85th percentile and so on. So we let i be an index that is updated by 1 every time that Signal moves from one percentile into another, and τ(i) the bucket associated with the ith cross.

This is the equation I am using. The notation might vary slightly. Equation

Now my question is how to go about this. Perhaps someone has an intuitive solution to this. I hope my question is clear.

"a","b"
0,0.013013698630137
0,0.0013522650439487
0,0.00135409614082593
0,0.00203389830508471
0.27804813511593,0.00135317997293627
0.300237801284318,0
0.495965075167796,0.00405405405405412
0.523741892051237,0.000672947510094168
0.558753750296458,0.00202020202020203
0.665762829019002,0.000672043010752743
0.493106479913899,0.000671591672263272
0.344592579573497,0.000672043010752854
0.336263897823707,0.00201748486886366
0.35884763774257,0.00536912751677865
0.23662807979007,0.00133511348464632
0.212636893966841,0.00267379679144386
0.362212830513403,0.000666666666666593
0.319216408413927,0.00333555703802535
0.277670854167344,0
0.310143323100971,0
0.374104373036218,0.00267737617135211
0.190943075221511,0.00268456375838921
0.165770070508112,0.00200803212851386
0.240310208616952,0.00133600534402145
0.212418038918236,0.00200133422281523
0.204282022136019,0.00200534759358306
0.363725074298064,0.000667111407605114
0.451807761954326,0.000666666666666593
0.369296011692801,0.000666222518321047
0.37503495989363,0.0026666666666666
0.323386355686901,0.00132978723404265
0.189216171830472,0.00266311584553924
0.185252052821193,0.00199203187250996
0.174882909380997,0.000662690523525522
0.149291525540782,0.00132625994694946
0.196824215268048,0.00264900662251666
0.164611993131396,0.000660501981505912
0.125470998266484,0.00132187706543285
0.179999532586703,0.00264026402640272
0.368749638521621,0.000658327847267826
0.427799340926225,0

Solution

  • My interpretation of the question

    I hope I understand your question correctly. Here is what I understood:

    1. For each row you compute which 5% percentile it belongs to
    2. Whenever that percentile changes, you start a new bucket
    3. All rows from the same bucket result in a single resulting value
    4. If there is only a single row in a bucket, the b value from that row is the resulting value
    5. Otherwise, you compute all abs(b[l]/b[m]-1) where m<l and both belong to the same bucket

    Basic answer

    Code

    This code here does what I describe above:

    # read the data (shortened, full data in OP)
    d <- read.table(textConnection("a,b
    0,0.013013698630137
    […]
    0.427799340926225,0
    "), sep=",", header=TRUE)
    
    # compute percentile number for each line    
    d$percentile <- floor(d$a/0.05)*5 + 5
    
    # start a new bucket whenever the percentile changes
    d$bucket <- cumsum(c(1, diff(d$percentile) != 0))
    
    # compute a single number for all rows of the same bucket
    aggregate(b ~ percentile + bucket, d, function(b) {
      if(length(b) == 1) return(b); # special case of only a single row
      m <- outer(b, b, function(pm, pl) abs(pl/pm - 1)) # compare all pairs
      return(max(m[upper.tri(m)])) # only return pairs with m < l
    })
    

    Output

    The result will look like this:

       percentile bucket            b
    1           5      1 0.8960891071
    2          30      2 0.0013531800
    3          35      3 0.0000000000
    4          50      4 0.0040540541
    5          55      5 0.0006729475
    6          60      6 0.0020202020
    7          70      7 0.0006720430
    8          50      8 0.0006715917
    9          35      9 2.0020174849
    10         40     10 0.0053691275
    11         25     11 1.0026737968
    12         40     12 0.0006666667
    13         35     13 0.0033355570
    14         30     14 0.0000000000
    15         35     15 0.0000000000
    16         40     16 0.0026773762
    17         20     17 0.2520080321
    18         25     18 0.5010026738
    19         40     19 0.0006671114
    20         50     20 0.0006666667
    21         40     21 3.0026666667
    22         35     22 0.0013297872
    23         20     23 0.7511597084
    24         15     24 0.0013262599
    25         20     25 0.7506605020
    26         15     26 0.0013218771
    27         20     27 0.0026402640
    28         40     28 0.0006583278
    29         45     29 0.0000000000
    

    Additional columns

    Code

    If you also want to know the number of items in each group, then I suggest you use the plyr library:

    library(plyr)
    
    aggB <- function(b) {
      if(length(b) == 1) return(b)
      m <- outer(b, b, function(pm, pl) abs(pl/pm - 1))
      return(max(m[upper.tri(m)]))
    }
    
    ddply(d, .(bucket), summarise,
          percentile = percentile[1], n = length(b), maxr = aggB(b))
    

    Output

    This will give you the following result:

       bucket percentile n         maxr
    1       1          5 4 0.8960891071
    2       2         30 1 0.0013531800
    3       3         35 1 0.0000000000
    4       4         50 1 0.0040540541
    5       5         55 1 0.0006729475
    6       6         60 1 0.0020202020
    7       7         70 1 0.0006720430
    8       8         50 1 0.0006715917
    9       9         35 2 2.0020174849
    10     10         40 1 0.0053691275
    11     11         25 2 1.0026737968
    12     12         40 1 0.0006666667
    13     13         35 1 0.0033355570
    14     14         30 1 0.0000000000
    15     15         35 1 0.0000000000
    16     16         40 1 0.0026773762
    17     17         20 2 0.2520080321
    18     18         25 3 0.5010026738
    19     19         40 1 0.0006671114
    20     20         50 1 0.0006666667
    21     21         40 2 3.0026666667
    22     22         35 1 0.0013297872
    23     23         20 3 0.7511597084
    24     24         15 1 0.0013262599
    25     25         20 2 0.7506605020
    26     26         15 1 0.0013218771
    27     27         20 1 0.0026402640
    28     28         40 1 0.0006583278
    29     29         45 1 0.0000000000