I am using the following Python code to analyze the number of levels a categorical variable has, and delete variables that have more than 53 levels:
df.select_dtypes(['category']).apply(lambda x: len(set(x)))
I receive the following output:
Out[1]:
favorite_drink 35
sex 2
title 12
status 3
dtype: int64
I see that the variable title has 12 levels. I want to analyze the value of those 12 levels, so I use:
df['title'].value_counts()
And I receive hundreds and hundres of lines through the output of previous values of the variable title
that right now have frequency 0. I am showing just a summary for illustrative purposes:
Out [2]:
...
361xx 0
460xx 0
178xx 0
607xx 0
Name: title, dtype: int64
What I would like to do is, that value_counts()
function only showed me the frequency of values that have frequency above 0. I know np.nan
values have argument dropna = False
, but I haven´t seen one for null frequency. I believe this topic is treated here without a solution from pandas
.
The dtypes
of my variables are:
df.dtypes
Out[3]:
favorite_drink category
sex category
title category
status category
Thanks in advance for your help on an approach to this necessity.
You can simply filter your series:
c = df['title'].value_counts()
c = c[c > 0]