I would like to add a new column to my dataframe that contains the most recent 'revenue' value where 'promotion' == 1, excluding the current row. The dataframe will always be sorted by 'day' in descending order. For rows near the bottom of the dataframe where there is not a previous row with 'promotion' == 1, it should ideally return nothing (but this is not a hard requirement as we will generally be most interested in recent records).
So if this is my dataframe:
import pandas as pd
import numpy as np
df = pd.DataFrame({'day':[10, 9, 8, 7, 6, 5, 4, 3, 2, 1],
'revenue':[12000, 13000, 17000, 14000, 16000,
15000, 19000, 17000, 12000, 11000],
'promotion':[0, 0, 1, 1, 0, 0, 1, 0, 1, 0]})
The new column would look like:
df['last_promo_rev'] = [17000, 17000, 14000, 19000, 19000, 19000, 12000, 12000, np.nan, np.nan]
This might work. We can use shift. This starts with your basic DataFrame example.
# Create column with valid promotion values set
df.loc[:, 'last_promo_rev'] = df.loc[df["promotion"] == 1, "revenue"]
# Shift the column by -1 to move the row up one, then backfill with revenue values.
df.loc[:, 'last_promo_rev'] = df.loc[:, "last_promo_rev"].shift(-1).bfill(axis="rows")
Output:
day revenue promotion last_promo_rev
0 10 12000 0 17000.0
1 9 13000 0 17000.0
2 8 17000 1 14000.0
3 7 14000 1 19000.0
4 6 16000 0 19000.0
5 5 15000 0 19000.0
6 4 19000 1 12000.0
7 3 17000 0 12000.0
8 2 12000 1 NaN
9 1 11000 0 NaN