Hi would be great to get some help on this for Python Pandas.TIA I have a dataframe which has 1M rows with below columns:
PID | lurn_fls | locality | Comparision |
---|---|---|---|
ACT933 | 2 | Kambah | mbn:match both non-empty |
ACT934 | 3F | Charwood | xne:mismatch neither empty |
ACT935 | 3R | Glenden | mbe:match both empty |
. | . | . | . |
. | . | . | . |
ACT155 | 4 | Glebe | xhe:mismatch h_empty |
I need to do a groupby "lurn_fls" and also count the different 'Comparision' column results per group, so that my result should look like below table. for eg:
lurn_fls | mbn:match both non-empty | xhe:mismatch h_empty | xne:mismatch neither empty | Total |
---|---|---|---|---|
1 | 600 | 12 | 15 | XXX |
2 | 700 | 10 | 14 | XXX |
3F | 800 | 8 | 10 | XXX |
3R | 900 | 6 | 12 | XXX |
4 | 500 | 4 | 10 | XXX |
5 | 400 | 2 | 14 | XXX |
IIUC, you can use pd.crosstab
:
out = pd.crosstab(df["lurn_fls"], df["Comparision"])
out["Total"] = out.sum(axis=1)
print(out)
Prints:
Comparision mbe:match both empty mbn:match both non-empty xhe:mismatch h_empty xne:mismatch neither empty Total
lurn_fls
2 0 1 0 0 1
3F 0 0 0 1 1
3R 1 0 0 0 1
4 0 0 1 0 1