Search code examples
pythonpandasgroup-bycountmultiple-conditions

Groupby with multiple conditions and count


Hi would be great to get some help on this for Python Pandas.TIA I have a dataframe which has 1M rows with below columns:

PID lurn_fls locality Comparision
ACT933 2 Kambah mbn:match both non-empty
ACT934 3F Charwood xne:mismatch neither empty
ACT935 3R Glenden mbe:match both empty
. . . .
. . . .
ACT155 4 Glebe xhe:mismatch h_empty

I need to do a groupby "lurn_fls" and also count the different 'Comparision' column results per group, so that my result should look like below table. for eg:

lurn_fls mbn:match both non-empty xhe:mismatch h_empty xne:mismatch neither empty Total
1 600 12 15 XXX
2 700 10 14 XXX
3F 800 8 10 XXX
3R 900 6 12 XXX
4 500 4 10 XXX
5 400 2 14 XXX

Solution

  • IIUC, you can use pd.crosstab:

    out = pd.crosstab(df["lurn_fls"], df["Comparision"])
    out["Total"] = out.sum(axis=1)
    
    print(out)
    

    Prints:

    Comparision  mbe:match both empty  mbn:match both non-empty  xhe:mismatch h_empty  xne:mismatch neither empty  Total
    lurn_fls                                                                                                            
    2                               0                         1                     0                           0      1
    3F                              0                         0                     0                           1      1
    3R                              1                         0                     0                           0      1
    4                               0                         0                     1                           0      1