I have two vectors, a and b. See attached.
a
is the signal and is a probability.
b
is the absolute percentage change the next period.
Signalt <- seq(0, 1, 0.05)
I would like to find the maximum absolute return occuring within each intermediate 5%-tile (Signalt
) of the a vector. So if it is
0.01, 0.02, 0.03, 0.06 0.07
then it should calculate the maximum return between
0.01 and 0.02,
0.01 and 0.03,
0.02 and 0.03.
Then move on to
0.06 and 0.07 do it over etc.
Output would then be combined in a matrix or table when the entire sequence has run.
It should follow the index from vector a and b.
i
is an index that is updated by one every time that a
crosses into a new percentile. t(i)
is the bucket associated with the i
th cross.
a
is the probability vector which has length tao. This vector should be analyzed in its 5% tiles, with the maximum intermediate absolute return being the output. The price change of next period is the vector b
. This would be represented by P in the equation below.
l
and m
are indexes.
Every time Signal moves from one 5% tile to another, we compute the largest absolute return that occurs between any two intermediate buckets, until Signal moves to another 5% tile. For example, suppose that Signal moves into the 85th percentile and 4 volume buckets later moves into the 90th percentile. We would then calculate absolute returns between buckets 1 and 2, 1 and 3, 1 and 4, 2 and 3, 2 and 4, 3 and 4. We are interested in the maximum absolute return. We would then calculate the max return in the following percentile bucket, move on to the next, which could be an 85th percentile and so on. So we let i be an index that is updated by 1 every time that Signal moves from one percentile into another, and τ(i) the bucket associated with the ith cross.
This is the equation I am using. The notation might vary slightly.
Now my question is how to go about this. Perhaps someone has an intuitive solution to this. I hope my question is clear.
"a","b"
0,0.013013698630137
0,0.0013522650439487
0,0.00135409614082593
0,0.00203389830508471
0.27804813511593,0.00135317997293627
0.300237801284318,0
0.495965075167796,0.00405405405405412
0.523741892051237,0.000672947510094168
0.558753750296458,0.00202020202020203
0.665762829019002,0.000672043010752743
0.493106479913899,0.000671591672263272
0.344592579573497,0.000672043010752854
0.336263897823707,0.00201748486886366
0.35884763774257,0.00536912751677865
0.23662807979007,0.00133511348464632
0.212636893966841,0.00267379679144386
0.362212830513403,0.000666666666666593
0.319216408413927,0.00333555703802535
0.277670854167344,0
0.310143323100971,0
0.374104373036218,0.00267737617135211
0.190943075221511,0.00268456375838921
0.165770070508112,0.00200803212851386
0.240310208616952,0.00133600534402145
0.212418038918236,0.00200133422281523
0.204282022136019,0.00200534759358306
0.363725074298064,0.000667111407605114
0.451807761954326,0.000666666666666593
0.369296011692801,0.000666222518321047
0.37503495989363,0.0026666666666666
0.323386355686901,0.00132978723404265
0.189216171830472,0.00266311584553924
0.185252052821193,0.00199203187250996
0.174882909380997,0.000662690523525522
0.149291525540782,0.00132625994694946
0.196824215268048,0.00264900662251666
0.164611993131396,0.000660501981505912
0.125470998266484,0.00132187706543285
0.179999532586703,0.00264026402640272
0.368749638521621,0.000658327847267826
0.427799340926225,0
I hope I understand your question correctly. Here is what I understood:
b
value from that row is the resulting valueabs(b[l]/b[m]-1)
where m<l
and both belong to the same bucketThis code here does what I describe above:
# read the data (shortened, full data in OP)
d <- read.table(textConnection("a,b
0,0.013013698630137
[…]
0.427799340926225,0
"), sep=",", header=TRUE)
# compute percentile number for each line
d$percentile <- floor(d$a/0.05)*5 + 5
# start a new bucket whenever the percentile changes
d$bucket <- cumsum(c(1, diff(d$percentile) != 0))
# compute a single number for all rows of the same bucket
aggregate(b ~ percentile + bucket, d, function(b) {
if(length(b) == 1) return(b); # special case of only a single row
m <- outer(b, b, function(pm, pl) abs(pl/pm - 1)) # compare all pairs
return(max(m[upper.tri(m)])) # only return pairs with m < l
})
The result will look like this:
percentile bucket b
1 5 1 0.8960891071
2 30 2 0.0013531800
3 35 3 0.0000000000
4 50 4 0.0040540541
5 55 5 0.0006729475
6 60 6 0.0020202020
7 70 7 0.0006720430
8 50 8 0.0006715917
9 35 9 2.0020174849
10 40 10 0.0053691275
11 25 11 1.0026737968
12 40 12 0.0006666667
13 35 13 0.0033355570
14 30 14 0.0000000000
15 35 15 0.0000000000
16 40 16 0.0026773762
17 20 17 0.2520080321
18 25 18 0.5010026738
19 40 19 0.0006671114
20 50 20 0.0006666667
21 40 21 3.0026666667
22 35 22 0.0013297872
23 20 23 0.7511597084
24 15 24 0.0013262599
25 20 25 0.7506605020
26 15 26 0.0013218771
27 20 27 0.0026402640
28 40 28 0.0006583278
29 45 29 0.0000000000
If you also want to know the number of items in each group, then I suggest you use the plyr
library:
library(plyr)
aggB <- function(b) {
if(length(b) == 1) return(b)
m <- outer(b, b, function(pm, pl) abs(pl/pm - 1))
return(max(m[upper.tri(m)]))
}
ddply(d, .(bucket), summarise,
percentile = percentile[1], n = length(b), maxr = aggB(b))
This will give you the following result:
bucket percentile n maxr
1 1 5 4 0.8960891071
2 2 30 1 0.0013531800
3 3 35 1 0.0000000000
4 4 50 1 0.0040540541
5 5 55 1 0.0006729475
6 6 60 1 0.0020202020
7 7 70 1 0.0006720430
8 8 50 1 0.0006715917
9 9 35 2 2.0020174849
10 10 40 1 0.0053691275
11 11 25 2 1.0026737968
12 12 40 1 0.0006666667
13 13 35 1 0.0033355570
14 14 30 1 0.0000000000
15 15 35 1 0.0000000000
16 16 40 1 0.0026773762
17 17 20 2 0.2520080321
18 18 25 3 0.5010026738
19 19 40 1 0.0006671114
20 20 50 1 0.0006666667
21 21 40 2 3.0026666667
22 22 35 1 0.0013297872
23 23 20 3 0.7511597084
24 24 15 1 0.0013262599
25 25 20 2 0.7506605020
26 26 15 1 0.0013218771
27 27 20 1 0.0026402640
28 28 40 1 0.0006583278
29 29 45 1 0.0000000000