I have a dataframe of event occurrences in which the date is formatted as 'YYYY-WW'. Various events can take place, some at the same time as other events in the same timeframe. Example datframe as follows;
df1 = pd.DataFrame({'yyyyww': ['2022-01','2022-02','2022-03', '2022-01','2022-02','2022-03','2022-01','2022-03'],
'event': ['event1','event1','event1','event2','event2','event3','event4','event4'],
'event_flag': [1,1,1,1,1,1,1,1,]})
I have a 2nd dataframe to which I want to left join the 1st dataframe. The 2nd dataframe could potentially contain many more dates than is featured in df1 but the for the purposes of this question is as follows:
df2 = pd.DataFrame({'yyyyww': ['2022-01','2022-02','2022-03'],
'col1': ['apple','car','banana']})
Ultimately, I want to perform a left join such that the values from event
in df1 become additional column headers in df2, with the event_flag
from df1 becoming a boolean value under the respective column header, as follows:
desired_outcome = pd.DataFrame({'yyyyww': ['2022-01','2022-02','2022-03'],
'col1': ['apple','car','banana'],
'event1':[1,1,1],
'event2':[1,1,0],
'event3':[0,0,1],
'event4':[1,0,1],
})
However, when using iterrows()
to achieve this, what I end up with is something that bears some resemblance to the desired outcome but duplicates the columns such that I end up with multiple columns with suffixes, as follows:
for index, row in df1.iterrows():
index_value = row['event']
#column_a_value = row['disco']
yyyyww = row['yyyyww']
event_flag = row['event_flag']
df2 = df2.merge(pd.DataFrame({'yyyyww': [yyyyww],
f'{index_value}': [event_flag]
}),
left_on='yyyyww', right_on='yyyyww', how='left')
df2.fillna(0)
How can I perform the required operation without resulting in duplicated columns?
After .merge
you can .pivot
the dataframe to obtain the final form:
out = (
df2.merge(df1, on="yyyyww")
.pivot(index=["yyyyww", "col1"], columns="event", values="event_flag")
.fillna(0)
.astype(int)
.reset_index()
.rename_axis(columns=None, index=None)
)
print(out)
Prints:
yyyyww col1 event1 event2 event3 event4
0 2022-01 apple 1 1 0 1
1 2022-02 car 1 1 0 0
2 2022-03 banana 1 0 1 1