Search code examples
pythongroup-bypivot-table

Crosstab (contingency table, or similar)


I have a dataset like this

id january february march april
1 scheduled done null done
2 scheduled scheduled done
3 ongoing canceled scheduled

I desire to transform this dataset in a matrix like this in output, where each cell are the occurrences of the exact intersection (keeping Null Values).

event january february march april
scheduled 2 1 1 0
done 0 1 1 1
ongoing 1 0 0 0
null 0 0 0 2

Solution

  • What you want cannot be done with pivot because your dataset is not in the standard structure. According to your dataset provide, for each month column, you use groupby function to count for the number of occurrences.

    Sample data:

    df = pd.DataFrame({'id':[1,2,3],
                       'jan':['schedule', 'schedule', 'ongoing'],
                       'feb':['done', 'schedule', 'canceled'],
                       'mar':['null', 'done', 'schedule'],
                       'apr':['done', np.NaN, np.NaN]})
    

    Output:

    temdf = pd.DataFrame()
    
    for i in ['jan', 'feb', 'mar', 'apr']:
        tem = df.groupby(f'{i}').size().reset_index().rename(columns={f'{i}':'event', 0:f'{i}'}).set_index('event')
        temdf = pd.concat([temdf, tem], axis=1)
    
    temdf.reset_index(inplace=True)
    
    temdf
    
    
          event  jan  feb  mar  apr
    0   ongoing  1.0  NaN  NaN  NaN
    1  schedule  2.0  1.0  1.0  NaN
    2  canceled  NaN  1.0  NaN  NaN
    3      done  NaN  1.0  1.0  1.0
    4      null  NaN  NaN  1.0  NaN