Search code examples
pythonpandasinfluxdbstandard-deviation

Rolling standard deviation with Pandas, and NaNs


I have data that looks like this:

1472698113000000000     -28.84
1472698118000000000     -26.69
1472698163000000000     -27.65
1472698168000000000     -26.1
1472698238000000000     -27.33
1472698243000000000     -26.47
1472698248000000000     -25.24
1472698253000000000     -25.53
1472698283000000000     -27.3
...

This is a time series that grows. Each time it grows, I attempt to get the rolling standard deviation of the set, using pandas.rolling_std. Each time, the result includes NaNs, which I cannot use (I am trying to insert the result into InfluxDB, and it complains when it sees the NaNs.)

I've experimented with different window sizes. I am doing this on different series, of varying rates of growth and current sizes (some just a couple of measurements long, some hundreds or thousands).

Simply, I just want to have a rolling standard deviation in InfluxDB so that I can graph it and watch how the source data is changing over time, with respect to its mean. How can I overcome this NaN problem?


Solution

  • If you are doing something like

    df.rolling(5).std()

    and getting

    0           NaN       NaN
    1           NaN       NaN
    2           NaN       NaN
    3           NaN       NaN
    4  5.032395e+10  1.037386
    5  5.345559e+10  0.633024
    6  4.263215e+10  0.967352
    7  3.510698e+10  0.822879
    8  1.767767e+10  0.971972
    

    You can strip away the NaNs by using .dropna().

    df.rolling(5).std().dropna():

    4  5.032395e+10  1.037386
    5  5.345559e+10  0.633024
    6  4.263215e+10  0.967352
    7  3.510698e+10  0.822879
    8  1.767767e+10  0.971972