Search code examples
pythonpandasvolatility

Why I got NaN when computing volatility?


I am trying to follow the equations on this paper here , to calculate the historical volatility for power time series data.

The statistical description of the data as follows :

count    9855.000000
mean      291.135088
std       187.503344
min         0.000000
25%       112.408512
50%       332.370871
75%       449.527323
max       601.370058

Here is my implementation:


# Computing Volatility
window_size=27
VOLATILITY = pd.DataFrame()
VOLATILITY['PV']= data
#'intra-hour
# Compute the logarithmic value  
VOLATILITY['Log_IA'] = np.log(VOLATILITY['PV'] / VOLATILITY['PV'].shift(1))
# Compute Volatility using the pandas rolling standard deviation function
VOLATILITY['intra-hour'] = VOLATILITY['Log_IA'].rolling(window=window_size).std() * np.sqrt(window_size)# *100

My data have 1 year of half-hourly observations for 27 steps for each day

Here is an example of the output I got for 1 selected day where the NaN problem exists

    
Timestamp           PV          Log_IA     intra-hour
2019-03-01 06:00:00 0.000000    NaN         NaN
2019-03-01 06:30:00 2.946333    inf         NaN
2019-03-01 07:00:00 20.963667   1.962229    NaN
2019-03-01 07:30:00 38.284333   0.602250    NaN
2019-03-01 08:00:00 38.224667   -0.001560   NaN
2019-03-01 08:30:00 54.486667   0.354475    NaN
2019-03-01 09:00:00 54.608333   0.002230    NaN
2019-03-01 09:30:00 55.290667   0.012418    NaN
2019-03-01 10:00:00 54.317333   -0.017761   NaN
2019-03-01 10:30:00 54.680333   0.006661    NaN
2019-03-01 11:00:00 42.142333   -0.260451   NaN
2019-03-01 11:30:00 44.569000   0.055986    NaN
2019-03-01 12:00:00 36.988333   -0.186436   NaN
2019-03-01 12:30:00 35.802000   -0.032599   NaN
2019-03-01 13:00:00 29.006667   -0.210478   NaN
2019-03-01 13:30:00 43.254333   0.399572    NaN
2019-03-01 14:00:00 45.246333   0.045024    NaN
2019-03-01 14:30:00 29.768333   -0.418676   NaN
2019-03-01 15:00:00 37.510667   0.231180    NaN
2019-03-01 15:30:00 31.937000   -0.160860   NaN
2019-03-01 16:00:00 39.990333   0.224873    NaN
2019-03-01 16:30:00 32.263000   -0.214717   NaN
2019-03-01 17:00:00 40.707333   0.232487    NaN
2019-03-01 17:30:00 14.551333   -1.028726   NaN
2019-03-01 18:00:00 10.294333   -0.346089   NaN
2019-03-01 18:30:00 2.552667    -1.394455   NaN
2019-03-01 19:00:00 0.036333    -4.252158   NaN

So, why I am getting the NaN as a result for volatility ?

is it a problem with my implementation?


Solution

  • Your implementation is such that you will always get NaN for the zeroth value of Log_IA regardless of the values of PV. This is a result of the shift and can be confirmed by running the following snippet:

    import numpy as np
    import pandas as pd
    
    
    VOLATILITY = pd.DataFrame()
    VOLATILITY['PV'] = (1.0, 2.0, 3.0)
    VOLATILITY['Log_IA'] = np.log(VOLATILITY['PV'] / VOLATILITY['PV'].shift(1))
    

    You will see that VOLATILITY is:

        PV    Log_IA
    0  1.0       NaN
    1  2.0  0.693147
    2  3.0  0.405465
    

    The zeroth value of Log_IA is NaN because you are dividing \log{1.0} by the previous value of PV, that is the value of PV at index -1. This is what shift() does, but there is no value at index -1 so you get a NaN. You may set a value to use in place of any non-existent values with the fill_value argument: shift(1, fill_value=123). You will also get inf for any value of Log_IA when the previous index's PV value is 0.

    So why do you get NaNs for all values of intra-hour? You take the rolling standard deviation of VOLATILITY['Log_IA'].rolling(window=window_size) which contains inf at index 0, and standard deviation is undefined for sequences containing inf. So the rolling standard deviation is NaN, and any arithmetic operation with NaN results in NaN.

    Now, you also have an issue with \log{0} in your code. Logarithm is undefined for values less than or equal to 0, but this is not what is causing your NaNs.

    https://www.varsitytutors.com/hotmath/hotmath_help/topics/logarithmic-functions.html