Search code examples
pythonmathfinance

Hurst Exponent in python


from datetime import datetime
from pandas.io.data import DataReader
from numpy import cumsum, log, polyfit, sqrt, std, subtract
from numpy.random import randn

def hurst(ts):

    """Returns the Hurst Exponent of the time series vector ts"""
    # Create the range of lag values
    lags = range(2, 100)

    # Calculate the array of the variances of the lagged differences
    # Here it calculates the variances, but why it uses 
    # standard deviation and then make a root of it?
    tau = [sqrt(std(subtract(ts[lag:], ts[:-lag]))) for lag in lags]

    # Use a linear fit to estimate the Hurst Exponent
    poly = polyfit(log(lags), log(tau), 1)

    # Return the Hurst exponent from the polyfit output
    return poly[0]*2.0


# Download the stock prices series from Yahoo
aapl = DataReader("AAPL", "yahoo", datetime(2012,1,1), datetime(2015,9,18))

# Call the function
hurst(aapl['Adj Close'])

From this code for estimating Hurst Exponent, when we want to calculate the variance of the lagged difference, why we still use a standard deviation and take a square root? I am confused for a long time, and I don't know why others don't have the same confuse. Do I misunderstand the math behind it? Thanks!


Solution

  • I'm just as confused. I don't understand where the sqrt of std comes from either, and have spent 3 days trying to figure it out. In the end I noticed QuantStart credits Dr Tom Starke who uses a slightly different code. Dr Tom Starke credits Dr Ernie Chan, and going to his blog. I was able to find enough information to put together my own code from his principles. This doesn't use sqrt, uses variance instead of std and uses a 2.0 divisor at the end instead of a 2.0 multiplier. In the end, it seems to give the same results as the quantstart code you post, but I am able to understand it from first principles, which I guess is important. I put together a Jupyter Notebook which makes it clearer, but I'm not sure if I can post that here, so I will try to explain as best I can here. Code is pasted first, then an explanation.

    lags = range(2,100)
    def hurst_ernie_chan(p):
    
        variancetau = []; tau = []
    
        for lag in lags: 
    
            #  Write the different lags into a vector to compute a set of tau or lags
            tau.append(lag)
    
            # Compute the log returns on all days, then compute the variance on the difference in log returns
            # call this pp or the price difference
            pp = subtract(p[lag:], p[:-lag])
            variancetau.append(var(pp))
    
        # we now have a set of tau or lags and a corresponding set of variances.
        #print tau
        #print variancetau
    
        # plot the log of those variance against the log of tau and get the slope
        m = polyfit(log10(tau),log10(variancetau),1)
    
        hurst = m[0] / 2
    
        return hurst
    

    Dr Chan doesn't give any code on this page (I believe he works in MATLAB not Python anyway). Hence I needed to put together my own code from the notes he gives in his blog and answers he gives to questions posed on his blog.

    1. Dr Chan states that if z is the log price, then volatility, sampled at intervals of τ, is volatility(τ)=√(Var(z(t)-z(t-τ))). To me another way of describing volatility is standard deviation, so std(τ)=√(Var(z(t)-z(t-τ)))

    2. std is just the root of variance so var(τ)=(Var(z(t)-z(t-τ)))

    3. Dr Chan then states: In general, we can write Var(τ) ∝ τ^(2H) where H is the Hurst exponent

    4. Hence (Var(z(t)-z(t-τ))) ∝ τ^(2H)

    5. Taking the log of each side we get log (Var(z(t)-z(t-τ))) ∝ 2H log τ

    6. [ log (Var(z(t)-z(t-τ))) / log τ ] / 2 ∝ H (gives the Hurst exponent) where we know the term in square brackets on far left is the slope of a log-log plot of tau and a corresponding set of variances.

    If you run that function and compare the answers to the Quantstart function, they should be the same. Not sure if that helped.