I currently have a table similar to this one:
CRED | ACBA
1 | 2
0 | 3
1 | 4
1 | 2
0 | 1
etc...
I was able to get information on the frequency of occurrence of a category (1,2,3,4) in column ACBA depending on the value in CRED (1,0) using:
pd.crosstab(df.CRED, df.ACBA)
ACBA 1 2 3 4
CRED
0 9 11 1 7
1 18 22 4 28
Now I would like to sum the values of ACBA for a specific value of CRED and then be able to divide each single value by that sum and create a new table with the result. Ex:
For CRED = 0 --> 9+11+1+7=28 then --> 9/28 11/28 1/28 7/28 to reach the final table:
1 2 3 4
CRED0 0.25 0.30 0.055 0.38
Does anyone have an idea of how to do this? I am new to Python and completely stuck on this. The idea is that I would repeat this technique across 22 other columns.Thanks
a = {'CRED': [1,0,1,1,0], 'ACBA': [2,3,4,2,1]}
df = pd.DataFrame(a)
output
ACBA CRED
0 2 1
1 3 0
2 4 1
3 2 1
4 1 0
then as what you used crosstab
it
df1 = pd.crosstab(df.CRED, df.ACBA)
ACBA 1 2 3 4
CRED
0 1 0 1 0
1 0 2 0 1
then get the percentage
df1.apply(lambda a: a / a.sum() * 100, axis=1)
ACBA 1 2 3 4
CRED
0 50.0 0.000000 50.0 0.000000
1 0.0 66.666667 0.0 33.333333