Search code examples
pythonpandasdatetimepandas-groupbyin-place

Dataframe: Add 'inplace' a column with the cumcount() of the dates appeared in the datetime.index


I would like to add a column (with header 'acc_dates') which would give the increasing count of the dates in a dataframe with datetime index. Here is an example:

import pandas as pd
import datetime as dt

datarange=pd.date_range('01-05-2018 00:00:00', periods=50, freq="4H")
range_series_1=pd.Series(np.random.randint(-5,3,size=50).astype(float), index=datarange)
df=pd.DataFrame({'value1':range_series_1})

df.head(5)
Out[287]: 
                     value1
datetime                   
2018-01-05 00:00:00     1.0
2018-01-05 04:00:00    -2.0
2018-01-05 08:00:00    -2.0
2018-01-05 12:00:00    -3.0
2018-01-05 16:00:00     1.0

If I apply the cumcount(), the 'value1' column disappears. This is what I type and what I get:

df.groupby(df.index.date).cumcount().to_frame('acc_dates').head(15)
Out[288]: 
                     acc_dates
datetime                      
2018-01-05 00:00:00          0
2018-01-05 04:00:00          1
2018-01-05 08:00:00          2
2018-01-05 12:00:00          3
2018-01-05 16:00:00          4
2018-01-05 20:00:00          5
2018-01-06 00:00:00          0
2018-01-06 04:00:00          1
2018-01-06 08:00:00          2
2018-01-06 12:00:00          3
2018-01-06 16:00:00          4
2018-01-06 20:00:00          5
2018-01-07 00:00:00          0
2018-01-07 04:00:00          1
2018-01-07 08:00:00          2

I can merge the two dataframes on 'datetime' to get the desired output, but I would prefer to not apply the pd.merge() method. Here's the output I expect:

Out[296]:
                     value1  acc_dates
datetime                              
2018-01-05 00:00:00     1.0          0
2018-01-05 04:00:00    -2.0          1
2018-01-05 08:00:00    -2.0          2
2018-01-05 12:00:00    -3.0          3
2018-01-05 16:00:00     1.0          4
2018-01-05 20:00:00     0.0          5
2018-01-06 00:00:00     2.0          0
2018-01-06 04:00:00    -3.0          1
2018-01-06 08:00:00    -5.0          2
2018-01-06 12:00:00    -5.0          3
2018-01-06 16:00:00     1.0          4
2018-01-06 20:00:00    -2.0          5
2018-01-07 00:00:00     2.0          0
2018-01-07 04:00:00     1.0          1
2018-01-07 08:00:00    -1.0          2
2018-01-07 12:00:00    -2.0          3

Ideally, I am looking for a method to somehow create and add the column in the initial df inplace.

Is this feasible? I welcome your suggestions.


Solution

  • I think merge or concat are not necessary, only assign output to new column:

    df['acc_dates'] = df.groupby(df.index.date).cumcount()