Search code examples
pythonpandasvolatility

Calculating volatility manually vs built-in functions are not the same


Can someone help me to understand where I'm wrong? I don't know why I get different volatility of each column...

This is an example of my code:

from math import sqrt
from numpy import around
from numpy.random import uniform
from pandas import DataFrame
from statistics import stdev

data = around(a=uniform(low=1.0, high=50.0, size=(500, 1)), decimals=3)
df = DataFrame(data=data, columns=['close'], dtype='float64')
df.loc[:, 'delta'] = df.loc[:, 'close'].pct_change().fillna(0).round(3)

volatility = []

for index in range(df.shape[0]):
    if index < 90:
        volatility.append(0)
    else:
        start = index - 90
        stop = index + 1
        volatility.append(stdev(df.loc[start:stop, 'delta']) * sqrt(252))

df.loc[:, 'volatility1'] = volatility
df.loc[:, 'volatility2'] = df.loc[:, 'delta'].rolling(window=90).std(ddof=0) * sqrt(252)

print(df)

      close   delta  volatility1  volatility2
0    10.099   0.000     0.000000          NaN
1    26.331   1.607     0.000000          NaN
2    32.361   0.229     0.000000          NaN
3     2.068  -0.936     0.000000          NaN
4    36.241  16.525     0.000000          NaN
..      ...     ...          ...          ...
495  48.015  -0.029    46.078037    46.132943
496   6.988  -0.854    46.036210    46.178820
497  23.331   2.339    46.003184    45.837245
498  25.551   0.095    45.608260    45.792188
499  46.248   0.810    45.793012    45.769787

[500 rows x 4 columns]

Thanks you so much!


Solution

  • There are three small changes needed. Added comments inline. 89 is needed since endpoint inclusive (unlike a lot of other python stuff). ddof=1 is needed because stdev uses this by default. This article talks about numpy std instead of stdev but the theory of what ddof is doing is still the same.

    Also, in the future, try changing size to something like 95. You don't need the other 405 rows when debugging and it is nice to see the changeover from 0/NaN to actual volatility to see you need 89 not 90.

    The 0 vs NaN difference still exists. This is a result of you appending 0 and rolling's default behavior. I wasn't sure if that was intentional or not so I left it.

    from math import sqrt
    from numpy import around
    from numpy.random import uniform
    from pandas import DataFrame
    from statistics import stdev
    
    data = around(a=uniform(low=1.0, high=50.0, size=(500, 1)), decimals=3)
    df = DataFrame(data=data, columns=['close'], dtype='float64')
    df['delta'] = df['close'].pct_change().fillna(0).round(3)
    
    volatility = []
    
    for index in range(df.shape[0]):
        if index < 89: #change to 89
            volatility.append(0)
        else:
            start = index - 89 #change to 89
            stop = index
            volatility.append(stdev(df.loc[start:stop, 'delta']) * sqrt(252))
    
    df['volatility1'] = volatility
    df['volatility2'] = df.loc[:, 'delta'].rolling(window=90).std(ddof=1) * sqrt(252) #change to ddof=1
    
    print(df)