Search code examples
pythonpandastime-seriesdatetimeindex

Extracting time interval by minute and hour in a particular day using datetimeindex of pandas


Suppose we have a dataframe including time indices and we want to extract only a dataframe including 10:23 to 14:34. How can we do this?

n =1000
i = pd.date_range('2018-04-09', periods=n, freq='1min')
ts = pd.DataFrame({'A': [i for i in range(n)]}, index=i)
print(ts)

                       A
2018-04-09 00:00:00    0
2018-04-09 00:01:00    1
2018-04-09 00:02:00    2
2018-04-09 00:03:00    3
2018-04-09 00:04:00    4
...                  ...
2018-04-09 16:35:00  995
2018-04-09 16:36:00  996
2018-04-09 16:37:00  997
2018-04-09 16:38:00  998
2018-04-09 16:39:00  999

My try:

I think for every problem like this, we need to break it into 3 conditions. Correct me if I am wrong.

mask1 = ( 10 == ts.index.hour & 23 <= ts.index.minute)
mask2 = ( 10 <= ts.index.hour )
mask3 = ( 14 == ts.index.hour & 34 >= ts.index.minute)

mask = mask1 | mask2 | mask3
ts_desire = ts[mask]

Then I get TypeError: Input must be Index or array-like.


Solution

  • Update

    Why it starts from 10? It is supposed to start from 10:23 inclusive and ends at 16:34 inclusive

    Maybe your are looking for between_time:

    >>> ts.between_time('10:23', '16:34')
    
                           A
    2018-04-09 10:23:00  623
    2018-04-09 10:24:00  624
    2018-04-09 10:25:00  625
    2018-04-09 10:26:00  626
    2018-04-09 10:27:00  627
    ...                  ...
    2018-04-09 16:30:00  990
    2018-04-09 16:31:00  991
    2018-04-09 16:32:00  992
    2018-04-09 16:33:00  993
    2018-04-09 16:34:00  994
    
    [372 rows x 1 columns]
    

    Missing ( ). Take care of operator priority: & take precedence over ==.

    #                  HERE ----v---v
    mask1 = (10 == ts.index.hour) & (23 <= ts.index.minute)
    mask2 = (10 <= ts.index.hour)
    mask3 = (14 == ts.index.hour) & (34 >= ts.index.minute)
    #                  HERE ----^---^
    
    mask = mask1 | mask2 | mask3
    ts_desire = ts[mask]
    

    Output:

    >>> ts_desire
                           A
    2018-04-09 10:00:00  600
    2018-04-09 10:01:00  601
    2018-04-09 10:02:00  602
    2018-04-09 10:03:00  603
    2018-04-09 10:04:00  604
    ...                  ...
    2018-04-09 16:35:00  995
    2018-04-09 16:36:00  996
    2018-04-09 16:37:00  997
    2018-04-09 16:38:00  998
    2018-04-09 16:39:00  999
    
    [400 rows x 1 columns]