Search code examples
pythonpandaspython-datetime

Creating an index after performing groupby on DateTIme


I have the following data in the format below (see below)

enter image description here

I next perform recasting, groupby and averaging (see code) to reduce data dimensionality.

df_mod=pd.read_csv('wet_bulb_hr.csv')
#Mod Date
df_mod['wbt_date'] = pd.to_datetime(df_mod['wbt_date']) 

#Mod Time
df_mod['wbt_time'] = df_mod['wbt_time'].astype('int')
df_mod['wbt_date'] = df_mod['wbt_date'] + \
                     pd.to_timedelta(df_mod['wbt_time']-1, unit='h')

df_mod['wet_bulb_temperature'] = \
df_mod['wet_bulb_temperature'].astype('float')
df = df_mod
df = df.drop(['wbt_time','_id'], axis = 1)
#df_novel = df.mean()
df = df.groupby([df.wbt_date.dt.year,df.wbt_date.dt.month]).mean()

After writing to an output file, I get an output that looks like this.

enter image description here

Investigating further, I can understand why. All my processing has resulted in a dataframe of shape 1 but what I really need is the 2 wbt_date columns to be exported as well. This does not seem to happen due to the groupby function

enter image description here

My question: How do I generate an index and have the groupby wbt_date columns as a new single column such that the output is: enter image description here


Solution

  • You can flatten MultiIndex to Index in YYYY-MM by list comprehension:

    df = df.groupby([df.wbt_date.dt.year,df.wbt_date.dt.month]).mean()
    df.index = [f'{y}-{m}' for y, m in df.index]
    df = df.rename_axis('date').reset_index()
    

    Or use month period by Series.dt.to_period:

    df = df.groupby([df.wbt_date.dt.to_period('m')).mean().reset_index()