python pandas seaborn boxplot categorical-data

Hide non observed categories in a seaborn boxplot

I am currently working on a data analysis, and want to show some data distributions through seaborn boxplots.

I have a categorical data, 'seg1' which can in my dataset take 3 values ('Z1', 'Z3', 'Z4'). However, data in group 'Z4' is too exotic to be reported for me, and I would like to produce boxplots showing only categories 'Z1' and 'Z3'.

Filtering the data source of the plot did not work, as category 'Z4' is still showed with no data point.

Is there any other solution than having to create a new CategoricalDtype with only ('Z1', 'Z3') and cast/project my data back on this new category?

I would simply like to hide 'Z4' category.

I am using seaborn 0.10.1 and matplotlib 3.3.1.

Thanks in advance for your answers.

My tries are below, and some data to reproduce.

Dummy data

dummy_cat = pd.CategoricalDtype(['a', 'b', 'c'])
df = pd.DataFrame({'col1': ['a', 'b', 'a', 'b'], 'col2': [12., 5., 3., 2]})
df.col1 = df.col1.astype(dummy_cat)
sns.boxplot(data=df, x='col1', y='col2')

Apply no filter

fig, axs = plt.subplots(figsize=(8, 25), nrows=len(indicators2), squeeze=False)
for j, indicator in enumerate(indicators2):
    sns.boxplot(data=orders, y=indicator, x='seg1', hue='origin2', ax=axs[j, 0], showfliers=False)

Which produces:

Filter data source

mask_filter = orders.seg1.isin(['Z1', 'Z3'])

fig, axs = plt.subplots(figsize=(8, 25), nrows=len(indicators2), squeeze=False)
for j, indicator in enumerate(indicators2):
    sns.boxplot(data=orders.loc[mask_filter], y=indicator, x='seg1', hue='origin2', ax=axs[j, 0], showfliers=False)

Which produces:

Solution

To cut off the last (or first) x-value, set_xlim() can be used, e.g. ax.set_xlim(-0.5, 1.5).

Another option is to work with seaborn's order= parameter and only add the desired values in that list. Optionally that can be created programmatically:

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

dummy_cat = pd.CategoricalDtype(['a', 'b', 'c'])
df = pd.DataFrame({'col1': ['a', 'b', 'a', 'b'], 'col2': [12., 5., 3., 2]})
df.col1 = df.col1.astype(dummy_cat)
order = [cat for cat in dummy_cat.categories if df['col1'].str.contains(cat).any()]
sns.boxplot(data=df, x='col1', y='col2', order=order)
plt.show()