I am currently working on a data analysis, and want to show some data distributions through seaborn boxplots.
I have a categorical data, 'seg1' which can in my dataset take 3 values ('Z1', 'Z3', 'Z4'). However, data in group 'Z4' is too exotic to be reported for me, and I would like to produce boxplots showing only categories 'Z1' and 'Z3'.
Filtering the data source of the plot did not work, as category 'Z4' is still showed with no data point.
Is there any other solution than having to create a new CategoricalDtype
with only ('Z1', 'Z3') and cast/project my data back on this new category?
I would simply like to hide 'Z4' category.
I am using seaborn 0.10.1 and matplotlib 3.3.1.
Thanks in advance for your answers.
My tries are below, and some data to reproduce.
Dummy data
dummy_cat = pd.CategoricalDtype(['a', 'b', 'c'])
df = pd.DataFrame({'col1': ['a', 'b', 'a', 'b'], 'col2': [12., 5., 3., 2]})
df.col1 = df.col1.astype(dummy_cat)
sns.boxplot(data=df, x='col1', y='col2')
Apply no filter
fig, axs = plt.subplots(figsize=(8, 25), nrows=len(indicators2), squeeze=False)
for j, indicator in enumerate(indicators2):
sns.boxplot(data=orders, y=indicator, x='seg1', hue='origin2', ax=axs[j, 0], showfliers=False)
Which produces:
Filter data source
mask_filter = orders.seg1.isin(['Z1', 'Z3'])
fig, axs = plt.subplots(figsize=(8, 25), nrows=len(indicators2), squeeze=False)
for j, indicator in enumerate(indicators2):
sns.boxplot(data=orders.loc[mask_filter], y=indicator, x='seg1', hue='origin2', ax=axs[j, 0], showfliers=False)
Which produces:
To cut off the last (or first) x-value, set_xlim()
can be used, e.g. ax.set_xlim(-0.5, 1.5)
.
Another option is to work with seaborn's order=
parameter and only add the desired values in that list. Optionally that can be created programmatically:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
dummy_cat = pd.CategoricalDtype(['a', 'b', 'c'])
df = pd.DataFrame({'col1': ['a', 'b', 'a', 'b'], 'col2': [12., 5., 3., 2]})
df.col1 = df.col1.astype(dummy_cat)
order = [cat for cat in dummy_cat.categories if df['col1'].str.contains(cat).any()]
sns.boxplot(data=df, x='col1', y='col2', order=order)
plt.show()