Search code examples
rtradingalgorithmic-tradingsarback-testing

R Parabolic SAR and Look-Ahead Bias


I am testing this in R using the SAR() function from the great TTR package implemented by Joshua Ulrich. I am not sure if this is a standard behavior of Parabolic SAR. If yes, I would need some help with implementing "future blind" SAR.

To make things simple I will work with short vectors and integer values instead of real time series data.

L <- c(1:4, 5)
H <- c(2:5, 6)

ParSAR <- SAR(cbind(H, L))
cbind(L, H, ParSAR)

     L H   ParSAR
[1,] 1 2 1.000000
[2,] 2 3 1.000000
[3,] 3 4 1.080000
[4,] 4 5 1.255200
[5,] 5 6 1.554784

I will only change one value on the last interval, where the Low - High range will be 5 - 7 now, instead of 5 - 6.

L <- c(1:4, 5)
H <- c(2:5, 7)

We get:

     L H    ParSAR
[1,] 1 2 0.5527864
[2,] 2 3 0.5817307
[3,] 3 4 0.6784614
[4,] 4 5 0.8777538
[5,] 5 7 1.2075335

Is it expected behavior that all the history of Parabolic SAR gets modified dramatically? If SAR values on rows from 1 to 4 are modified by a different future value on the row 5, it introduces look-ahead bias to the prior rows.

If this is a standard behavior of Parabolic SAR and I need it for a backtest, I will have to recompute it for every row always masking all future data (rows).

Desired result is to have Parabolic SAR value for each row as I could have witness it at the particular moment in time, not knowing the future.


EDIT 2016-06-18

Simplified code example for user3666197:

> SAR(cbind(c(2, 3, 4, 5, 6), c(1, 2, 3, 4, 5)), c(0.02, 0.2))
[1] 1.000000 1.000000 1.080000 1.255200 1.554784

> SAR(cbind(c(2, 3, 4, 5, 7), c(1, 2, 3, 4, 5)), c(0.02, 0.2))
[1] 0.5527864 0.5817307 0.6784614 0.8777538 1.2075335

Solution

  • R implementation of Parabolic SAR comes with a look-ahead bias.

    The initGap value is a Standard Deviation of all time HL data:

    initGap <- sd(drop(coredata(HL[, 1] - HL[, 2])), na.rm = TRUE)
    

    References: https://github.com/joshuaulrich/TTR/issues/23

    The drastical impact on my original example is caused by short sample of data and the extreme values used.