Search code examples
pythonseaborn

seaborn countplot count wrongly nan


Somehow I have trouble in getting the right result from a countplot. Let's look at the following dummy data

In [111]: import pandas as pd

In [112]: import seaborn as sns

In [113]: import numpy as np

In [114]: data = pd.DataFrame({"A": [np.nan, np.nan, 2], "Cat": [0,1,0], "x":["l", "n", "k"]})

In [115]: data
Out[115]: 
     A  Cat  x
0  NaN    0  l
1  NaN    1  n
2  2.0    0  k

In [116]: sns.countplot(data=data, x="x", hue="Cat")

I would expect bars for l and n to be zero while for k to show a one. However, my countplot shows everywhere a one. What I'm doing wrongly? I would like to have the counts over column A

enter image description here


Solution

  • A countplot will count the number of occurrences per x, it looks like you rather want a barplot after pre-aggregating the data:

    sns.barplot(data=data.assign(A=data['A'].notna())
                         .groupby(['x', 'Cat'], as_index=False, sort=False)
                         .sum(),
                x='x', y='A', hue='Cat')
    

    Output:

    seaborn barplot of ocunts

    If you want to use a countplot, you could also convert the x/Cat to category and dropna:

    sns.countplot(data=data.astype({'x': 'category', 'Cat': 'category'})
                           .dropna(subset='A'), x='x', hue='Cat')
    

    Output:

    seaborn countplot with NaNs