I have a DataFrame that I want to add to it a new col that states for each id and for each year, how many times there were positive profits in the past. For example:
id year profit
0 1 2018 0
1 1 2019 10
2 1 2020 20
3 1 2021 0
4 2 2018 0
5 2 2019 20
6 2 2020 10
the desired outcome should look like this:
id year profit past_profit
0 1 2018 0 0
1 1 2019 10 0
2 1 2020 20 1
3 1 2021 0 2
4 2 2018 0 0
5 2 2019 20 0
6 2 2020 10 1
Any ideas?
We could use groupby
+ shift
to get past year's data and use groupby
+ cumsum
to get how many times a positive profit happened for each "id".
df['past_profit'] = (df['profit']>0).groupby(df['id']).shift().fillna(False).groupby(df['id']).cumsum()
Output:
id year profit past_profit
0 1 2018 0 0
1 1 2019 10 0
2 1 2020 20 1
3 1 2021 0 2
4 2 2018 0 0
5 2 2019 20 0
6 2 2020 10 1
Note that this assumes that the data is sorted by year
. If not, we could first sort by id and year; then use the above code.
df = df.sort_values(by=['id','year'])