Search code examples
pythonlistpandasdataframefrequency-distribution

how to find frequency distribution in a data frame with respect to a particular column using pandas in python


I have a data-frame like this,

df,
A   B   C   D   Final
a   b   c   d   Valid
a       c       Valid
a       c   d   Valid
a               Valid

I want to calculate how many % of each column present in the Final Column.

My desired output is,

output = a=4,b=1,c=3,d=2

Please help


Solution

  • If empty values are missing use drop with count:

    print (df)
       A    B    C    D  Final
    0  a    b    c    d  Valid
    1  a  NaN    c  NaN  Valid
    2  a  NaN    c    d  Valid
    3  a  NaN  NaN  NaN  Valid
    
    df = df.drop('Final', axis=1).count()
    print (df)
    A    4
    B    1
    C    3
    D    2
    dtype: int64
    

    If values are empty strings first compare by eq and sum Trues:

    print (df)
       A  B  C  D  Final
    0  a  b  c  d  Valid
    1  a     c     Valid
    2  a     c  d  Valid
    3  a           Valid
    
    df = df.drop('Final', axis=1).ne('').sum()
    print (df)
    A    4
    B    1
    C    3
    D    2
    dtype: int64
    

    print (df.to_dict())
    {'B': 1, 'A': 4, 'C': 3, 'D': 2}
    
    d = df.div(len(df.index)).mul(100).to_dict()
    print (d)
    {'B': 25.0, 'A': 100.0, 'C': 75.0, 'D': 50.0}