Search code examples
pythonpandasmulti-indexdatetimeindex

Slicing pandas DateTimeIndex with steps


I often deal with pandas DataFrames with DateTimeIndexes, where I want to - for example - select only the parts where the hour of the index = 6. The only way I currently know how to do this is with reindexing:

df.reindex(pd.date_range(*df.index.to_series().agg([min, max]).apply(lambda ts: ts.replace(hour=6)), freq="24H"))

But this is quite unreadable and complex, which gets even worse when there is a MultiIndex with multiple DateTimeIndex levels. I know of methods that use .reset_index() and then either df.where or df.loc with conditional statements, but is there a simpler way to do this with regular IndexSlicing? I tried it as follows

df.loc[df.index.min().replace(hour=6)::pd.Timedelta(24, unit="H")]

but this gives a TypeError:

TypeError: '>=' not supported between instances of 'Timedelta' and 'int'

Solution

  • If your index is a DatetimeIndex, you can use:

    >>> df[df.index.hour == 6]
                         val
    2022-03-01 06:00:00    7
    2022-03-02 06:00:00   31
    2022-03-03 06:00:00   55
    2022-03-04 06:00:00   79
    2022-03-05 06:00:00  103
    2022-03-06 06:00:00  127
    2022-03-07 06:00:00  151
    2022-03-08 06:00:00  175
    2022-03-09 06:00:00  199
    2022-03-10 06:00:00  223
    2022-03-11 06:00:00  247
    2022-03-12 06:00:00  271
    2022-03-13 06:00:00  295
    2022-03-14 06:00:00  319
    2022-03-15 06:00:00  343
    2022-03-16 06:00:00  367
    2022-03-17 06:00:00  391
    2022-03-18 06:00:00  415
    2022-03-19 06:00:00  439
    2022-03-20 06:00:00  463
    2022-03-21 06:00:00  487
    

    Setup:

    dti = pd.date_range('2022-3-1', '2022-3-22', freq='1H')
    df = pd.DataFrame({'val': range(1, len(dti)+1)}, index=dti)