I have data that looks like this:
1472698113000000000 -28.84
1472698118000000000 -26.69
1472698163000000000 -27.65
1472698168000000000 -26.1
1472698238000000000 -27.33
1472698243000000000 -26.47
1472698248000000000 -25.24
1472698253000000000 -25.53
1472698283000000000 -27.3
...
This is a time series that grows. Each time it grows, I attempt to get the rolling standard deviation of the set, using pandas.rolling_std
. Each time, the result includes NaNs, which I cannot use (I am trying to insert the result into InfluxDB, and it complains when it sees the NaNs.)
I've experimented with different window sizes. I am doing this on different series, of varying rates of growth and current sizes (some just a couple of measurements long, some hundreds or thousands).
Simply, I just want to have a rolling standard deviation in InfluxDB so that I can graph it and watch how the source data is changing over time, with respect to its mean. How can I overcome this NaN problem?
If you are doing something like
df.rolling(5).std()
and getting
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 5.032395e+10 1.037386
5 5.345559e+10 0.633024
6 4.263215e+10 0.967352
7 3.510698e+10 0.822879
8 1.767767e+10 0.971972
You can strip away the NaNs by using .dropna()
.
df.rolling(5).std().dropna()
:
4 5.032395e+10 1.037386
5 5.345559e+10 0.633024
6 4.263215e+10 0.967352
7 3.510698e+10 0.822879
8 1.767767e+10 0.971972