Search code examples
pythonpandasseabornboxplotcategorical-data

Hide non observed categories in a seaborn boxplot


I am currently working on a data analysis, and want to show some data distributions through seaborn boxplots.

I have a categorical data, 'seg1' which can in my dataset take 3 values ('Z1', 'Z3', 'Z4'). However, data in group 'Z4' is too exotic to be reported for me, and I would like to produce boxplots showing only categories 'Z1' and 'Z3'.

Filtering the data source of the plot did not work, as category 'Z4' is still showed with no data point.

Is there any other solution than having to create a new CategoricalDtype with only ('Z1', 'Z3') and cast/project my data back on this new category?

I would simply like to hide 'Z4' category.

I am using seaborn 0.10.1 and matplotlib 3.3.1.

Thanks in advance for your answers.

My tries are below, and some data to reproduce.

Dummy data

dummy_cat = pd.CategoricalDtype(['a', 'b', 'c'])
df = pd.DataFrame({'col1': ['a', 'b', 'a', 'b'], 'col2': [12., 5., 3., 2]})
df.col1 = df.col1.astype(dummy_cat)
sns.boxplot(data=df, x='col1', y='col2')

dummy data

Apply no filter

fig, axs = plt.subplots(figsize=(8, 25), nrows=len(indicators2), squeeze=False)
for j, indicator in enumerate(indicators2):
    sns.boxplot(data=orders, y=indicator, x='seg1', hue='origin2', ax=axs[j, 0], showfliers=False)

Which produces:

Non filtered data

Filter data source

mask_filter = orders.seg1.isin(['Z1', 'Z3'])

fig, axs = plt.subplots(figsize=(8, 25), nrows=len(indicators2), squeeze=False)
for j, indicator in enumerate(indicators2):
    sns.boxplot(data=orders.loc[mask_filter], y=indicator, x='seg1', hue='origin2', ax=axs[j, 0], showfliers=False)

Which produces:

Filter data source


Solution

  • To cut off the last (or first) x-value, set_xlim() can be used, e.g. ax.set_xlim(-0.5, 1.5).

    Another option is to work with seaborn's order= parameter and only add the desired values in that list. Optionally that can be created programmatically:

    import matplotlib.pyplot as plt
    import pandas as pd
    import seaborn as sns
    
    dummy_cat = pd.CategoricalDtype(['a', 'b', 'c'])
    df = pd.DataFrame({'col1': ['a', 'b', 'a', 'b'], 'col2': [12., 5., 3., 2]})
    df.col1 = df.col1.astype(dummy_cat)
    order = [cat for cat in dummy_cat.categories if df['col1'].str.contains(cat).any()]
    sns.boxplot(data=df, x='col1', y='col2', order=order)
    plt.show()
    

    example plot