Hurst Exponent in python

from datetime import datetime
from pandas.io.data import DataReader
from numpy import cumsum, log, polyfit, sqrt, std, subtract
from numpy.random import randn

def hurst(ts):

    """Returns the Hurst Exponent of the time series vector ts"""
    # Create the range of lag values
    lags = range(2, 100)

    # Calculate the array of the variances of the lagged differences
    # Here it calculates the variances, but why it uses 
    # standard deviation and then make a root of it?
    tau = [sqrt(std(subtract(ts[lag:], ts[:-lag]))) for lag in lags]

    # Use a linear fit to estimate the Hurst Exponent
    poly = polyfit(log(lags), log(tau), 1)

    # Return the Hurst exponent from the polyfit output
    return poly[0]*2.0


# Download the stock prices series from Yahoo
aapl = DataReader("AAPL", "yahoo", datetime(2012,1,1), datetime(2015,9,18))

# Call the function
hurst(aapl['Adj Close'])

From this code for estimating Hurst Exponent, when we want to calculate the variance of the lagged difference, why we still use a standard deviation and take a square root? I am confused for a long time, and I don't know why others don't have the same confuse. Do I misunderstand the math behind it? Thanks!

Solution

I'm just as confused. I don't understand where the sqrt of std comes from either, and have spent 3 days trying to figure it out. In the end I noticed QuantStart credits Dr Tom Starke who uses a slightly different code. Dr Tom Starke credits Dr Ernie Chan, and going to his blog. I was able to find enough information to put together my own code from his principles. This doesn't use sqrt, uses variance instead of std and uses a 2.0 divisor at the end instead of a 2.0 multiplier. In the end, it seems to give the same results as the quantstart code you post, but I am able to understand it from first principles, which I guess is important. I put together a Jupyter Notebook which makes it clearer, but I'm not sure if I can post that here, so I will try to explain as best I can here. Code is pasted first, then an explanation.

lags = range(2,100)
def hurst_ernie_chan(p):

    variancetau = []; tau = []

    for lag in lags: 

        #  Write the different lags into a vector to compute a set of tau or lags
        tau.append(lag)

        # Compute the log returns on all days, then compute the variance on the difference in log returns
        # call this pp or the price difference
        pp = subtract(p[lag:], p[:-lag])
        variancetau.append(var(pp))

    # we now have a set of tau or lags and a corresponding set of variances.
    #print tau
    #print variancetau

    # plot the log of those variance against the log of tau and get the slope
    m = polyfit(log10(tau),log10(variancetau),1)

    hurst = m[0] / 2

    return hurst

Dr Chan doesn't give any code on this page (I believe he works in MATLAB not Python anyway). Hence I needed to put together my own code from the notes he gives in his blog and answers he gives to questions posed on his blog.

Dr Chan states that if z is the log price, then volatility, sampled at intervals of τ, is volatility(τ)=√(Var(z(t)-z(t-τ))). To me another way of describing volatility is standard deviation, so std(τ)=√(Var(z(t)-z(t-τ)))
std is just the root of variance so var(τ)=(Var(z(t)-z(t-τ)))
Dr Chan then states: In general, we can write Var(τ) ∝ τ^(2H) where H is the Hurst exponent
Hence (Var(z(t)-z(t-τ))) ∝ τ^(2H)
Taking the log of each side we get log (Var(z(t)-z(t-τ))) ∝ 2H log τ
[ log (Var(z(t)-z(t-τ))) / log τ ] / 2 ∝ H (gives the Hurst exponent) where we know the term in square brackets on far left is the slope of a log-log plot of tau and a corresponding set of variances.

If you run that function and compare the answers to the Quantstart function, they should be the same. Not sure if that helped.