Search code examples
seaborncountplot

Countplotting a binary Series takes an excessive amount of time, and ends up producing a strange plot


As can be seen in the screenshot below, my y_train Series contains two values: 0 and 1. Why, then, does the countplot take almost 4 minutes to run, and when it finally finishes, it produces a plot that has nothing to do with the data?

countplot


Solution

  • You should directly pass the Series to x:

    sns.countplot(x=y_train)
    

    why did it fail?

    Assuming the latest seaborn (0.13.2).

    If you pass a Series to data, this is considered a wide format.

    data: DataFrame, Series, dict, array, or list of arrays

    Dataset for plotting. If x and y are absent, this is interpreted as wide-form. Otherwise it is expected to be long-form.

    import seaborn as sns
    
    y_train = pd.Series([0, 1, 1, 0], name='count')
    
    sns.countplot(data=y_train)
    

    Output:

    enter image description here

    In your case, this creates more than 20,000 bars.

    A workaround could be to convert to_frame and pass the name as x:

    import seaborn as sns
    
    y_train = pd.Series(np.repeat([0, 1], [16548, 5251]), name='count')
    
    sns.countplot(data=y_train.to_frame(), x='count')
    

    enter image description here