I've sample data in the form :
week year flag_1 flag_2
26 2022 0 0
27 2022 1 0
28 2022 0 0
2 2023 0 1
3 2023 1 0
4 2023 0 0
5 2023 1 1
6 2023 0 1
7 2023 0 0
8 2023 0 0
9 2023 0 0
10 2023 0 1
11 2023 0 1
I want to create two new columns span_flag_1, span_flag_2.
span_flag_1 will contain 1 if flag_1 == 1, and for all the other rows, span_flag_1 will be max(num weeks since last flag_1 = 1, num weeks till next flag_1 = 1)
How can i do the same?
You can achieve this this way with pandas
import pandas as pd
data = {
"week": [26, 27, 28, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
"year": [2022, 2022, 2022, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023],
"flag_1": [0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0],
"flag_2": [0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1]
}
df = pd.DataFrame(data)
df['overall_week'] = df['year'] * 52 + df['week']
df['since_last_flag_1'] = df[df['flag_1'] == 1]['overall_week']
df['since_last_flag_1'] = df['since_last_flag_1'].fillna(method='ffill')
df['since_last_flag_1'] = df['overall_week'] - df['since_last_flag_1']
df['till_next_flag_1'] = df[df['flag_1'] == 1]['overall_week']
df['till_next_flag_1'] = df['till_next_flag_1'].fillna(method='bfill')
df['till_next_flag_1'] = df['till_next_flag_1'] - df['overall_week']
df['span_flag_1'] = df[['since_last_flag_1', 'till_next_flag_1']].max(axis=1)
df.loc[df['flag_1'] == 1, 'span_flag_1'] = 1
df.drop(columns=['overall_week', 'since_last_flag_1', 'till_next_flag_1'], inplace=True)
print(df)
which gives
week year flag_1 flag_2 span_flag_1
0 26 2022 0 0 1.0
1 27 2022 1 0 1.0
2 28 2022 0 0 27.0
3 2 2023 0 1 27.0
4 3 2023 1 0 1.0
5 4 2023 0 0 1.0
6 5 2023 1 1 1.0
7 6 2023 0 1 1.0
8 7 2023 0 0 2.0
9 8 2023 0 0 3.0
10 9 2023 0 0 4.0
11 10 2023 0 1 5.0
12 11 2023 0 1 6.0