Search code examples
pythonpandasdataframesumpandas-resample

Pandas resample to return NaN when all values are NaN


I'm using resample to sum my data into hourly blocks. When all input data for the hour is NaN, resample is producing a value of 0 instead of NaN.

My raw data is this:

infile
Out[206]:
             Date_time  Rainfall
0  2019-02-02 14:18:00       NaN
1  2019-02-02 14:20:00       NaN
2  2019-02-02 14:25:00       NaN
3  2019-02-02 14:30:00       NaN
4  2019-02-02 14:35:00       NaN
5  2019-02-02 14:40:00       NaN
6  2019-02-02 14:45:00       NaN
7  2019-02-02 14:50:00       NaN
8  2019-02-02 14:55:00       NaN
9  2019-02-02 15:00:00       0.0
10 2019-02-02 15:05:00       NaN
11 2019-02-02 15:10:00       NaN
12 2019-02-02 15:15:00       NaN
13 2019-02-02 15:20:00       NaN
14 2019-02-02 15:25:00       NaN
15 2019-02-02 15:30:00       NaN
16 2019-02-02 15:35:00       NaN
17 2019-02-02 15:40:00       NaN
18 2019-02-02 15:45:00       NaN
19 2019-02-02 15:50:00       NaN
20 2019-02-02 15:55:00       NaN

I want my output to be this:

             Date_time  Rainfall  
0  2019-02-02 14:18:00       NaN
1  2019-02-02 15:00:00       0.0

But instead I'm getting this:

output[['Date_time', 'Rainfall']]
Out[208]: 
                Date_time  Rainfall
0     2019-02-02 14:18:00       0.0
1     2019-02-02 15:00:00       0.0

This is the code that I'm using to get there - it's a little more complicated than it needs to be for this example because I use it to iterate through a list of column names at other points:

def sum_calc(col_name):
    col =  infile[['Date_time', col_name]].copy()
    col.columns = ('A', 'B')
    col = col.resample('H', on='A').B.sum().reset_index(drop=True)
    output[col_name] = col.copy()

sum_calc('Rainfall')

Any clues on how to get this to work? I've had a look online and all the options seem to produce NaN if any value in group is NaN, rather than all values like I'm after.


Solution

  • Try:

    >>> df.resample("H", on="Date_time")["Rainfall"].agg(pd.Series.sum, min_count=1)
    Date_time
    2021-12-17 14:00:00    NaN
    2021-12-17 15:00:00    0.0
    Freq: H, Name: Rainfall, dtype: float64