python pandas influxdb standard-deviation

Rolling standard deviation with Pandas, and NaNs

I have data that looks like this:

1472698113000000000     -28.84
1472698118000000000     -26.69
1472698163000000000     -27.65
1472698168000000000     -26.1
1472698238000000000     -27.33
1472698243000000000     -26.47
1472698248000000000     -25.24
1472698253000000000     -25.53
1472698283000000000     -27.3
...

This is a time series that grows. Each time it grows, I attempt to get the rolling standard deviation of the set, using pandas.rolling_std. Each time, the result includes NaNs, which I cannot use (I am trying to insert the result into InfluxDB, and it complains when it sees the NaNs.)

I've experimented with different window sizes. I am doing this on different series, of varying rates of growth and current sizes (some just a couple of measurements long, some hundreds or thousands).

Simply, I just want to have a rolling standard deviation in InfluxDB so that I can graph it and watch how the source data is changing over time, with respect to its mean. How can I overcome this NaN problem?

Solution

If you are doing something like

df.rolling(5).std()

and getting

0           NaN       NaN
1           NaN       NaN
2           NaN       NaN
3           NaN       NaN
4  5.032395e+10  1.037386
5  5.345559e+10  0.633024
6  4.263215e+10  0.967352
7  3.510698e+10  0.822879
8  1.767767e+10  0.971972

You can strip away the NaNs by using .dropna().

df.rolling(5).std().dropna():

4  5.032395e+10  1.037386
5  5.345559e+10  0.633024
6  4.263215e+10  0.967352
7  3.510698e+10  0.822879
8  1.767767e+10  0.971972