Search code examples
pythonpandasfrequency

Counting the occurences a particular item in a column


ctr = df['gender'].value_counts()

ctr

**Output:**

female       102

male          83

nonbinary      5

trans          2

Name: gender, dtype: int64

This is the output I got and I understand that this is how the output should look like. But I'm interested to know the frequency of a particular item under the 'gender' column, in this case for example; female or male, and want to store each of those frequencies to two separate variables. I'm relatively a newbie in this field. Any help is appreciated.


Solution

  • The fastest way to do this will be to use value_counts().

    As you already know, df['gender'].value_counts() will give you the distribution. However, you can also specify which one you are looking for.

    import pandas as pd
    df = pd.DataFrame({'gender':['male','female','trans','nonbinary',
                                 'male','female','male','female',
                                 'male','female','male','female',
                                 'male','female','female','trans','nonbinary',
                                 'male','male','male','trans',
                                 'male','male','female','female',
                                 'male','male','nonbinary','trans','male']})
    #print (df)
    m = df['gender'].value_counts().male
    f = df['gender'].value_counts().female
    t = df['gender'].value_counts().trans
    b = df['gender'].value_counts().nonbinary
    
    print (df['gender'].value_counts())
    print ('male      :', m)
    print ('female    :', f)
    print ('trans     :', t)
    print ('nonbinary :', b)
    

    The output of this will be:

    values_counts() will give you an output as:

    male         14
    female        9
    trans         4
    nonbinary     3
    Name: gender, dtype: int64
    

    However, specific prints will give you output as:

    male      : 14
    female    : 9
    trans     : 4
    nonbinary : 3