Search code examples
pythonpandasstandard-deviationrolling-computation

Pandas rolling stdev behavior


I am using Pandas to calculate Bollinger Bands for security prices. I am using the 20-day average closing price and two standard deviations +/- that value to calculate the bands.

I noticed differences between my calculated values and those reported on third party data providers when using pandas .rolling().std(); upon calculating the stdev "manually" (step-by-step), I was able to reproduce values that were consistent with the third party sites, and similarly varied from the values calculated with .rolling.std() on the same data.

A subset of the data is presented below, with the two sets of calculations; "manual" and pandas "rolling." Is there something in my code that is causing the pandas rolling data to vary?

close_values = {'close': [88.64, 87.02, 88.7, 89.89, 90.93, 90.4, 90.21, 90.16, 90.94, 91.93, 
                          90.26, 85.5, 81.89, 82.18, 78.31, 75.65, 79.61, 76.55, 80.41, 77.87]}
df = pd.DataFrame(close_values)

df['period_mean'] = df['close'].mean()
df['abs_^2_var'] = (np.abs(df['close'] - df['period_mean'])) ** 2
df['stdev_man'] = np.sqrt(df['abs_^2_var'].rolling(20).mean())
df['stdev_roll'] = df['close'].rolling(20).std()

df['BolU_man'] = round(df['period_mean'] + (df['stdev_man'] * 2), 4)
df['BolL_man'] = round(df['period_mean'] - (df['stdev_man'] * 2), 4)

df['BolU_roll'] = round(
    df['close'].rolling(20).mean() + (2 * (df['close'].rolling(20).std())), 4)
df['BolL_roll'] = round(
    df['close'].rolling(20).mean() - (2 * (df['close'].rolling(20).std())), 4)

print(df)

This is the output of the above:

    close  period_mean  abs_^2_var  stdev_man  stdev_roll  BolU_man  BolL_man  \
0   88.64      85.3525   10.807656        NaN         NaN       NaN       NaN   
1   87.02      85.3525    2.780556        NaN         NaN       NaN       NaN   
2   88.70      85.3525   11.205756        NaN         NaN       NaN       NaN   
3   89.89      85.3525   20.588906        NaN         NaN       NaN       NaN   
4   90.93      85.3525   31.108506        NaN         NaN       NaN       NaN   
5   90.40      85.3525   25.477256        NaN         NaN       NaN       NaN   
6   90.21      85.3525   23.595306        NaN         NaN       NaN       NaN   
7   90.16      85.3525   23.112056        NaN         NaN       NaN       NaN   
8   90.94      85.3525   31.220156        NaN         NaN       NaN       NaN   
9   91.93      85.3525   43.263506        NaN         NaN       NaN       NaN   
10  90.26      85.3525   24.083556        NaN         NaN       NaN       NaN   
11  85.50      85.3525    0.021756        NaN         NaN       NaN       NaN   
12  81.89      85.3525   11.988906        NaN         NaN       NaN       NaN   
13  82.18      85.3525   10.064756        NaN         NaN       NaN       NaN   
14  78.31      85.3525   49.596806        NaN         NaN       NaN       NaN   
15  75.65      85.3525   94.138506        NaN         NaN       NaN       NaN   
16  79.61      85.3525   32.976306        NaN         NaN       NaN       NaN   
17  76.55      85.3525   77.484006        NaN         NaN       NaN       NaN   
18  80.41      85.3525   24.428306        NaN         NaN       NaN       NaN   
19  77.87      85.3525   55.987806   5.495136     5.63789   96.3428   74.3622   

    BolU_roll  BolL_roll  
0         NaN        NaN  
1         NaN        NaN  
2         NaN        NaN  
3         NaN        NaN  
4         NaN        NaN  
5         NaN        NaN  
6         NaN        NaN  
7         NaN        NaN  
8         NaN        NaN  
9         NaN        NaN  
10        NaN        NaN  
11        NaN        NaN  
12        NaN        NaN  
13        NaN        NaN  
14        NaN        NaN  
15        NaN        NaN  
16        NaN        NaN  
17        NaN        NaN  
18        NaN        NaN  
19    96.6283    74.0767  

Solution

  • It looks like the Pandas rolling standard deviation function applies Bessel's correction by default:

    The default ddof of 1 used in Series.std is different than the default ddof of 0 in numpy.std.

    You can amend this behaviour by passing ddof=0:

    df['stdev_roll'] = df['close'].rolling(20).std(ddof=0)
    

    This gives the same result as in your manual calculation.