Search code examples
pythondataframeiterationfrequency

Count occurrence of items in column depending on value in other column - Python


I currently have a table similar to this one:

 CRED | ACBA
  1   |  2 
  0   |  3 
  1   |  4 
  1   |  2 
  0   |  1 

etc...

I was able to get information on the frequency of occurrence of a category (1,2,3,4) in column ACBA depending on the value in CRED (1,0) using:

pd.crosstab(df.CRED, df.ACBA)

ACBA   1   2  3   4
CRED               
  0    9  11  1   7
  1   18  22  4  28

Now I would like to sum the values of ACBA for a specific value of CRED and then be able to divide each single value by that sum and create a new table with the result. Ex:

For CRED = 0  -->  9+11+1+7=28 then --> 9/28  11/28  1/28  7/28 to reach the final table:

         1     2     3     4

CRED0  0.25  0.30  0.055  0.38

Does anyone have an idea of how to do this? I am new to Python and completely stuck on this. The idea is that I would repeat this technique across 22 other columns.Thanks


Solution

  • a = {'CRED': [1,0,1,1,0], 'ACBA': [2,3,4,2,1]}
    df = pd.DataFrame(a)
    

    output

       ACBA  CRED
    0     2     1
    1     3     0
    2     4     1
    3     2     1
    4     1     0
    

    then as what you used crosstab it

       df1 = pd.crosstab(df.CRED, df.ACBA)
    ACBA  1  2  3  4
    CRED            
    0     1  0  1  0
    1     0  2  0  1
    

    then get the percentage

    df1.apply(lambda a: a / a.sum() * 100, axis=1)
    
    ACBA     1          2     3          4
    CRED                                  
    0     50.0   0.000000  50.0   0.000000
    1      0.0  66.666667   0.0  33.333333