I have a dataframe that looks like
Date col_1 col_2 col_3
2022-08-20 5 B 1
2022-07-21 6 A 1
2022-07-20 2 A 1
2022-06-15 5 B 1
2022-06-11 3 C 1
2022-06-05 5 C 2
2022-06-01 3 B 2
2022-05-21 6 A 1
2022-05-13 6 A 0
2022-05-10 2 B 3
2022-04-11 2 C 3
2022-03-16 5 A 3
2022-02-20 5 B 1
and i want to add a new column col_new
that cumcount
the number of rows with the same elements in col_1
and col_2
but excluding that row itself and such that the element in col_3
is 1. So the desired output would look like
Date col_1 col_2 col_3 col_new
2022-08-20 5 B 1 3
2022-07-21 6 A 1 2
2022-07-20 2 A 1 1
2022-06-15 5 B 1 2
2022-06-11 3 C 1 1
2022-06-05 5 C 2 0
2022-06-01 3 B 2 0
2022-05-21 6 A 1 1
2022-05-13 6 A 0 0
2022-05-10 2 B 3 0
2022-04-11 2 C 3 0
2022-03-16 5 A 3 0
2022-02-20 5 B 1 1
And here's what I have tried:
Date = pd.to_datetime(df['Date'], dayfirst=True)
list_col_3_is_1 = (df
.assign(Date=Date)
.sort_values('Date', ascending=True)
['col_3'].eq(1))
df['col_new'] = (list_col_3_is_1.groupby(df[['col_1','col_2']]).apply(lambda g: g.shift(1, fill_value=0).cumsum()))
But then I got the following error: ValueError: Grouper for '<class 'pandas.core.frame.DataFrame'>' not 1-dimensional Thanks in advance.
Your solution should be changed:
df['col_new'] = list_col_3_is_1.groupby([df['col_1'],df['col_2']]).cumsum()
print (df)
Date col_1 col_2 col_3 col_new
0 2022-08-20 5 B 1 3
1 2022-07-21 6 A 1 2
2 2022-07-20 2 A 1 1
3 2022-06-15 5 B 1 2
4 2022-06-11 3 C 1 1
5 2022-06-05 5 C 2 0
6 2022-06-01 3 B 2 0
7 2022-05-21 6 A 1 1
8 2022-05-13 6 A 0 0
9 2022-05-10 2 B 3 0
10 2022-04-11 2 C 3 0
11 2022-03-16 5 A 3 0
12 2022-02-20 5 B 1 1