python pandas dataframe group-by pandas-resample

Pandas: calculate the morning averaged values or afternoon averaged values

I got a dataframe like this:

gpi_data[['sig','hourtime']]
Out[28]: 
                   sig                   hourtime
datetime_doy                                     
2007-01-02   -8.963545 2007-01-02 09:20:11.249998
2007-01-03   -8.671357 2007-01-03 10:39:31.874991
2007-01-03   -8.996480 2007-01-03 20:22:59.999006
2007-01-04   -8.835958 2007-01-04 10:18:56.249024
2007-01-05   -8.785034 2007-01-05 21:21:39.374002
               ...                        ...
2019-12-30   -8.529724 2019-12-30 20:23:01.874996
2019-12-30   -8.563781 2019-12-30 20:48:28.125016
2019-12-30   -8.504211 2019-12-30 21:23:44.999996
2019-12-31   -8.460620 2019-12-31 09:39:31.873999
2019-12-31   -8.230092 2019-12-31 10:18:58.125014

[7983 rows x 2 columns]

and I want to calculate the averaged values of each morning and each afternoon based on hour time. By morning I mean the data is observed around 10:00:00, and 22:00:00 for afternoon. If there is no values on the morning/evening on this day, fill it with np.nan. For example, on 2007-01-01 we don't have any morning or evening values of sig. Then we fill it with two np.nan values. Then on 2007-01-02 we only have morning value, so we fill the evening value of 2007-01-02 with np.nan. SPECIFICALLY, for 2019-12-30, we have 3 evening values which are 2019-12-30 20:23:01.874996, 2019-12-30 20:48:28.125016 and 2019-12-30 21:23:44.999996. So we need to calculate the average value of -8.529724, -8.563781 and -8.504211. It's same for the last two datapoints on the morning of 2019-12-31, we need to average them, and fill the np.nan to the evening of 2019-12-31.

So ideally the final result would be:

gpi_data[['sig','hourtime']]
Out[28]: 
                   sig                   hourtime
datetime_doy                                     
2007-01-01   nan       2007-01-01 10:00:00
2007-01-01   nan       2007-01-01 22:00:00
2007-01-02   -8.963545 2007-01-02 09:20:11.249998
2007-01-02   nan       2007-01-02 22:00:00
2007-01-03   -8.671357 2007-01-03 10:39:31.874991
2007-01-03   -8.996480 2007-01-03 20:22:59.999006
2007-01-04   -8.835958 2007-01-04 10:18:56.249024
2007-01-04   nan       2007-01-04 22:00:00
2007-01-05   nan       2007-01-05 10:00:00
2007-01-05   -8.785034 2007-01-05 21:21:39.374002
               ...                        ...
2019-12-30   -8.532572 2019-12-30 22:00:00
2019-12-31   -8.345356 2019-12-31 09:39:31.873999
2019-12-31   nan       2019-12-31 22:00:00

It's fine if we round all hourtime to 10:00:00 or 22:00:00 like below:

gpi_data[['sig','hourtime']]
Out[28]: 
                   sig                   hourtime
datetime_doy                                     
2007-01-01   nan       2007-01-01 10:00:00
2007-01-01   nan       2007-01-01 22:00:00
2007-01-02   -8.963545 2007-01-02 10:00:00
2007-01-02   nan       2007-01-02 22:00:00
2007-01-03   -8.671357 2007-01-03 10:00:00
2007-01-03   -8.996480 2007-01-03 22:00:00
2007-01-04   -8.835958 2007-01-04 10:00:00
2007-01-04   nan       2007-01-04 22:00:00
2007-01-05   nan       2007-01-05 10:00:00
2007-01-05   -8.785034 2007-01-05 22:00:00
               ...                        ...
2019-12-30   -8.532572 2019-12-30 22:00:00
2019-12-31   -8.460620 2019-12-31 10:00:00
2019-12-31   nan       2019-12-31 22:00:00

How can I do it? is there anybody who can help me? Thanks!

Solution

Use cut for defined 10 and 22 column by some thresholds, here is used 12 and 23 hours.

Then create MultiIndex by minimal and maximal years in MultiIndex.from_product, aggregate mean and add missing combinations by Series.reindex, last create hourtime column:

df['hourtime'] = pd.cut(df['hourtime'].dt.hour, bins=[0,12,23], labels=[10,22])

start = pd.Timestamp(year=df.index.year.min(), month=1, day=1)
end = pd.Timestamp(year=df.index.year.max(), month=12, day=31)
mux = pd.MultiIndex.from_product([pd.date_range(start, end), [10,22]],
                                  names=['datetime_doy','h'])

df = df.groupby([df.index, 'hourtime'])['sig'].mean().reindex(mux).reset_index(level=1)
df['hourtime'] = df.index + pd.to_timedelta(df.pop('h'), unit='H')
print (df)
                   sig            hourtime
datetime_doy                              
2007-01-01         NaN 2007-01-01 10:00:00
2007-01-01         NaN 2007-01-01 22:00:00
2007-01-02   -8.963545 2007-01-02 10:00:00
2007-01-02         NaN 2007-01-02 22:00:00
2007-01-03   -8.671357 2007-01-03 10:00:00
               ...                 ...
2019-12-29         NaN 2019-12-29 22:00:00
2019-12-30         NaN 2019-12-30 10:00:00
2019-12-30   -8.532572 2019-12-30 22:00:00
2019-12-31   -8.345356 2019-12-31 10:00:00
2019-12-31         NaN 2019-12-31 22:00:00

[9496 rows x 2 columns]