Search code examples
python-3.xpandasdataframegroup-by

New column based on values in row and a fixed column value in Pandas Dataframe


I have a dataframe that looks like

Date        col_1   col_2   col_3
2022-08-20  5       B       1
2022-07-21  6       A       1
2022-07-20  2       A       1 
2022-06-15  5       B       1
2022-06-11  3       C       1
2022-06-05  5       C       2
2022-06-01  3       B       2
2022-05-21  6       A       1
2022-05-13  6       A       0
2022-05-10  2       B       3
2022-04-11  2       C       3
2022-03-16  5       A       3
2022-02-20  5       B       1

and i want to add a new column col_new that cumcount the number of rows with the same elements in col_1 and col_2 but excluding that row itself and such that the element in col_3 is 1. So the desired output would look like

Date        col_1   col_2   col_3   col_new
2022-08-20  5       B       1       3
2022-07-21  6       A       1       2
2022-07-20  2       A       1       1
2022-06-15  5       B       1       2
2022-06-11  3       C       1       1
2022-06-05  5       C       2       0
2022-06-01  3       B       2       0
2022-05-21  6       A       1       1
2022-05-13  6       A       0       0
2022-05-10  2       B       3       0
2022-04-11  2       C       3       0
2022-03-16  5       A       3       0
2022-02-20  5       B       1       1

And here's what I have tried:

Date = pd.to_datetime(df['Date'], dayfirst=True)
list_col_3_is_1 = (df
                   .assign(Date=Date)
                   .sort_values('Date', ascending=True)
                   ['col_3'].eq(1))
df['col_new'] = (list_col_3_is_1.groupby(df[['col_1','col_2']]).apply(lambda g: g.shift(1, fill_value=0).cumsum()))

But then I got the following error: ValueError: Grouper for '<class 'pandas.core.frame.DataFrame'>' not 1-dimensional Thanks in advance.


Solution

  • Your solution should be changed:

    df['col_new'] = list_col_3_is_1.groupby([df['col_1'],df['col_2']]).cumsum()
    
    print (df)
             Date  col_1 col_2  col_3  col_new
    0  2022-08-20      5     B      1        3
    1  2022-07-21      6     A      1        2
    2  2022-07-20      2     A      1        1
    3  2022-06-15      5     B      1        2
    4  2022-06-11      3     C      1        1
    5  2022-06-05      5     C      2        0
    6  2022-06-01      3     B      2        0
    7  2022-05-21      6     A      1        1
    8  2022-05-13      6     A      0        0
    9  2022-05-10      2     B      3        0
    10 2022-04-11      2     C      3        0
    11 2022-03-16      5     A      3        0
    12 2022-02-20      5     B      1        1