Search code examples
pythonpython-3.xpandasdataframepandas-groupby

count by id with dynamic criteria


I have a DataFrame that I want to add to it a new col that states for each id and for each year, how many times there were positive profits in the past. For example:

   id  year  profit
0   1  2018       0
1   1  2019      10
2   1  2020      20
3   1  2021       0
4   2  2018       0
5   2  2019      20
6   2  2020      10

the desired outcome should look like this:

   id  year  profit  past_profit
0   1  2018       0            0
1   1  2019      10            0
2   1  2020      20            1
3   1  2021       0            2
4   2  2018       0            0
5   2  2019      20            0
6   2  2020      10            1

Any ideas?


Solution

  • We could use groupby + shift to get past year's data and use groupby + cumsum to get how many times a positive profit happened for each "id".

    df['past_profit'] = (df['profit']>0).groupby(df['id']).shift().fillna(False).groupby(df['id']).cumsum()
    

    Output:

       id  year  profit  past_profit
    0   1  2018       0            0
    1   1  2019      10            0
    2   1  2020      20            1
    3   1  2021       0            2
    4   2  2018       0            0
    5   2  2019      20            0
    6   2  2020      10            1
    

    Note that this assumes that the data is sorted by year. If not, we could first sort by id and year; then use the above code.

    df = df.sort_values(by=['id','year'])