I have a dataset like this
id | january | february | march | april |
---|---|---|---|---|
1 | scheduled | done | null | done |
2 | scheduled | scheduled | done | |
3 | ongoing | canceled | scheduled |
I desire to transform this dataset in a matrix like this in output, where each cell are the occurrences of the exact intersection (keeping Null Values).
event | january | february | march | april |
---|---|---|---|---|
scheduled | 2 | 1 | 1 | 0 |
done | 0 | 1 | 1 | 1 |
ongoing | 1 | 0 | 0 | 0 |
null | 0 | 0 | 0 | 2 |
What you want cannot be done with pivot
because your dataset is not in the standard structure. According to your dataset provide, for each month column, you use groupby
function to count for the number of occurrences.
Sample data:
df = pd.DataFrame({'id':[1,2,3],
'jan':['schedule', 'schedule', 'ongoing'],
'feb':['done', 'schedule', 'canceled'],
'mar':['null', 'done', 'schedule'],
'apr':['done', np.NaN, np.NaN]})
Output:
temdf = pd.DataFrame()
for i in ['jan', 'feb', 'mar', 'apr']:
tem = df.groupby(f'{i}').size().reset_index().rename(columns={f'{i}':'event', 0:f'{i}'}).set_index('event')
temdf = pd.concat([temdf, tem], axis=1)
temdf.reset_index(inplace=True)
temdf
event jan feb mar apr
0 ongoing 1.0 NaN NaN NaN
1 schedule 2.0 1.0 1.0 NaN
2 canceled NaN 1.0 NaN NaN
3 done NaN 1.0 1.0 1.0
4 null NaN NaN 1.0 NaN