from datetime import datetime
from pandas.io.data import DataReader
from numpy import cumsum, log, polyfit, sqrt, std, subtract
from numpy.random import randn
def hurst(ts):
"""Returns the Hurst Exponent of the time series vector ts"""
# Create the range of lag values
lags = range(2, 100)
# Calculate the array of the variances of the lagged differences
# Here it calculates the variances, but why it uses
# standard deviation and then make a root of it?
tau = [sqrt(std(subtract(ts[lag:], ts[:-lag]))) for lag in lags]
# Use a linear fit to estimate the Hurst Exponent
poly = polyfit(log(lags), log(tau), 1)
# Return the Hurst exponent from the polyfit output
return poly[0]*2.0
# Download the stock prices series from Yahoo
aapl = DataReader("AAPL", "yahoo", datetime(2012,1,1), datetime(2015,9,18))
# Call the function
hurst(aapl['Adj Close'])
From this code for estimating Hurst Exponent, when we want to calculate the variance of the lagged difference, why we still use a standard deviation and take a square root? I am confused for a long time, and I don't know why others don't have the same confuse. Do I misunderstand the math behind it? Thanks!
I'm just as confused. I don't understand where the sqrt of std comes from either, and have spent 3 days trying to figure it out. In the end I noticed QuantStart credits Dr Tom Starke who uses a slightly different code. Dr Tom Starke credits Dr Ernie Chan, and going to his blog. I was able to find enough information to put together my own code from his principles. This doesn't use sqrt, uses variance instead of std and uses a 2.0 divisor at the end instead of a 2.0 multiplier. In the end, it seems to give the same results as the quantstart code you post, but I am able to understand it from first principles, which I guess is important. I put together a Jupyter Notebook which makes it clearer, but I'm not sure if I can post that here, so I will try to explain as best I can here. Code is pasted first, then an explanation.
lags = range(2,100)
def hurst_ernie_chan(p):
variancetau = []; tau = []
for lag in lags:
# Write the different lags into a vector to compute a set of tau or lags
tau.append(lag)
# Compute the log returns on all days, then compute the variance on the difference in log returns
# call this pp or the price difference
pp = subtract(p[lag:], p[:-lag])
variancetau.append(var(pp))
# we now have a set of tau or lags and a corresponding set of variances.
#print tau
#print variancetau
# plot the log of those variance against the log of tau and get the slope
m = polyfit(log10(tau),log10(variancetau),1)
hurst = m[0] / 2
return hurst
Dr Chan doesn't give any code on this page (I believe he works in MATLAB not Python anyway). Hence I needed to put together my own code from the notes he gives in his blog and answers he gives to questions posed on his blog.
Dr Chan states that if z is the log price, then volatility, sampled at intervals of τ, is volatility(τ)=√(Var(z(t)-z(t-τ))). To me another way of describing volatility is standard deviation, so std(τ)=√(Var(z(t)-z(t-τ)))
std is just the root of variance so var(τ)=(Var(z(t)-z(t-τ)))
Dr Chan then states: In general, we can write Var(τ) ∝ τ^(2H) where H is the Hurst exponent
Hence (Var(z(t)-z(t-τ))) ∝ τ^(2H)
Taking the log of each side we get log (Var(z(t)-z(t-τ))) ∝ 2H log τ
[ log (Var(z(t)-z(t-τ))) / log τ ] / 2 ∝ H (gives the Hurst exponent) where we know the term in square brackets on far left is the slope of a log-log plot of tau and a corresponding set of variances.
If you run that function and compare the answers to the Quantstart function, they should be the same. Not sure if that helped.