Search code examples
pythonpandascategoriesfrequencycategorical-data

Not showing 0 counts in Pandas category frequency tables


I am using the following Python code to analyze the number of levels a categorical variable has, and delete variables that have more than 53 levels:

df.select_dtypes(['category']).apply(lambda x: len(set(x)))

I receive the following output:

Out[1]:
favorite_drink         35
sex                     2
title                  12
status                  3
dtype: int64

I see that the variable title has 12 levels. I want to analyze the value of those 12 levels, so I use:

df['title'].value_counts()

And I receive hundreds and hundres of lines through the output of previous values of the variable title that right now have frequency 0. I am showing just a summary for illustrative purposes:

Out [2]:
...
361xx                          0
460xx                          0
178xx                          0
607xx                          0
Name: title, dtype: int64

What I would like to do is, that value_counts() function only showed me the frequency of values that have frequency above 0. I know np.nan values have argument dropna = False, but I haven´t seen one for null frequency. I believe this topic is treated here without a solution from pandas.

The dtypes of my variables are:

df.dtypes

Out[3]:
favorite_drink            category
sex                       category
title                     category
status                    category

Thanks in advance for your help on an approach to this necessity.


Solution

  • You can simply filter your series:

    c = df['title'].value_counts()
    c = c[c > 0]