Search code examples
pythonpandasdatetimepandas-groupbymulti-index

Make a new index after grouping by hour and minute (or concatenating 2 levels of a multiindex)


Following this discussion, you can group date by the time of day rather than the datetime itself:

import pandas as pd
import numpy as np

dr = pd.date_range('01-01-2020', '01-03-2020', freq='30T')
df = pd.DataFrame(np.random.rand(len(dr)), index=dr)
df = df.groupby([df.index.hour, df.index.minute]).sum()

This aggregates the data as expected, but returns a MultiIndex DataFrame, with unlabeled hours at one level and unlabeled minutes in another:

              0
0  0   1.383863
   30  1.293589
1  0   0.545617
   30  0.208717
2  0   0.464892
   30  0.392486
...

I want to create a new single index which represents the times either as strings or as actual datetime.time(). I can get the times into a string, doing something like the following:

hours = df.index.get_level_values(0).astype(str).str.pad(2, fillchar='0')
minutes = df.index.get_level_values(1).astype(str).str.pad(2, fillchar='0')
new = hours + ':' + minutes
df = df.set_index(new, drop=True)
df.index.name = 'time'

Output:

              0
time           
00:00  1.203189
00:30  1.943932
01:00  1.671113
01:30  1.640122
02:00  1.240969
02:30  0.687489
...

This works but is a very cumbersome set of commands. My questions are:

  1. Is there an easier way to do convert this MultiIndex to a combined string version?
  2. How could I convert the MultiIndex to datetime.time()? I know I could convert the original index with df.index = df.index.time, but I haven't found a way to take the hours from one level of index and minutes from another level of index to get time objects.

Solution

  • To group into strings:

    import pandas as pd
    import numpy as np
        
    dr = pd.date_range('01-01-2020', '01-03-2020', freq='30T')
    df = pd.DataFrame(np.random.rand(len(dr)), index=dr)
    df = df.groupby(df.index.strftime('%H:%M')).sum()
    df.index.name = 'time'
    

    Or to group into datetime.time, replace the groupby line with:

    df = df.groupby(df.index.strftime(df.index.time)).sum()