Search code examples
pythondataframegroup-bybucketresample

Count minutes per day over index


I have a dataframe with irregular timestamps in seconds that spans over multiple days, I would like to create a new column and bucket these entries in minute buckets and have an increasing counter in a separate column. So all values that are within one minute should get the same counter value which increases with the number of minutes per day, on a new day the counter should start from 1 again.

                       Value    Counter
2020-01-01 10:00:00      7.       1
2020-01-01 10:00:05      45.       1
2020-01-01 10:00:10      25.       1
2020-01-01 10:02:00      85.       2
2020-01-02 07:00:00      51.       1
2020-01-02 10:00:00      52.       2

I thought about sth like this

df['Counter'] = df.groupby([df.index.dt.day, df.index.dt.minute]).count()

Which does not seem to work.


Solution

  • You can try .groupby inside .groupby + "ngroup":

    df.index = pd.to_datetime(df.index)
    
    df["Counter"] = df.groupby(df.index.date, group_keys=False).apply(
        lambda x: x.groupby([x.index.hour, x.index.minute]).transform("ngroup") + 1
    )
    print(df)
    

    Prints:

                         Value  Counter
    2020-01-01 10:00:00      7        1
    2020-01-01 10:00:05     45        1
    2020-01-01 10:00:10     25        1
    2020-01-01 10:02:00     85        2
    2020-01-02 07:00:00     51        1
    2020-01-02 10:00:00     52        2