I would like to add a column (with header 'acc_dates') which would give the increasing count of the dates in a dataframe with datetime index. Here is an example:
import pandas as pd
import datetime as dt
datarange=pd.date_range('01-05-2018 00:00:00', periods=50, freq="4H")
range_series_1=pd.Series(np.random.randint(-5,3,size=50).astype(float), index=datarange)
df=pd.DataFrame({'value1':range_series_1})
df.head(5)
Out[287]:
value1
datetime
2018-01-05 00:00:00 1.0
2018-01-05 04:00:00 -2.0
2018-01-05 08:00:00 -2.0
2018-01-05 12:00:00 -3.0
2018-01-05 16:00:00 1.0
If I apply the cumcount()
, the 'value1' column disappears. This is what I type and what I get:
df.groupby(df.index.date).cumcount().to_frame('acc_dates').head(15)
Out[288]:
acc_dates
datetime
2018-01-05 00:00:00 0
2018-01-05 04:00:00 1
2018-01-05 08:00:00 2
2018-01-05 12:00:00 3
2018-01-05 16:00:00 4
2018-01-05 20:00:00 5
2018-01-06 00:00:00 0
2018-01-06 04:00:00 1
2018-01-06 08:00:00 2
2018-01-06 12:00:00 3
2018-01-06 16:00:00 4
2018-01-06 20:00:00 5
2018-01-07 00:00:00 0
2018-01-07 04:00:00 1
2018-01-07 08:00:00 2
I can merge the two dataframes on 'datetime' to get the desired output, but I would prefer to not apply the pd.merge() method. Here's the output I expect:
Out[296]:
value1 acc_dates
datetime
2018-01-05 00:00:00 1.0 0
2018-01-05 04:00:00 -2.0 1
2018-01-05 08:00:00 -2.0 2
2018-01-05 12:00:00 -3.0 3
2018-01-05 16:00:00 1.0 4
2018-01-05 20:00:00 0.0 5
2018-01-06 00:00:00 2.0 0
2018-01-06 04:00:00 -3.0 1
2018-01-06 08:00:00 -5.0 2
2018-01-06 12:00:00 -5.0 3
2018-01-06 16:00:00 1.0 4
2018-01-06 20:00:00 -2.0 5
2018-01-07 00:00:00 2.0 0
2018-01-07 04:00:00 1.0 1
2018-01-07 08:00:00 -1.0 2
2018-01-07 12:00:00 -2.0 3
Ideally, I am looking for a method to somehow create and add the column in the initial df inplace
.
Is this feasible? I welcome your suggestions.
I think merge
or concat
are not necessary, only assign output to new column:
df['acc_dates'] = df.groupby(df.index.date).cumcount()