I have code using seaborn catplot
, to draw categorical plots onto a FacetGrid. I am using a countplot
in the catplot
function, hence am using kind='count'
. The col
argument in the catplot
is set to the col_cat
variable, which in this context is defined as age_category
.
age_category
is a column in my df
, which as its name suggests, represents age categories. This is an ordered pandas categorical dtype.
My df
is as follows:
ipdb> df
spirometryResult_category age_category habits-smoking
_id
63bb97708e5f58ef85f6e4ea Normal 20-39 years old Yes
63bd1b228e5f58ef85f73130 Normal 20-39 years old Yes
6423cb1c174e67af0aa0f0fc Normal 20-39 years old No
6423d85e174e67af0aa10cda Restrictive 20-39 years old No
6423d8bb174e67af0aa10d98 Obstructive 20-39 years old No
... ... ... ...
6549a0df0941d048fdfd94c4 Obstructive 20-39 years old No
6549d0ab0941d048fdfd960d Normal 40-59 years old No
6549d0ee0941d048fdfd962b Normal 20-39 years old No
654b17a20941d048fdfda256 Normal 20-39 years old No
654d81700941d048fdfdc27d Normal 40-59 years old No
[106 rows x 3 columns]
The age_category
column in df
is as follows:
ipdb> df['age_category']
_id
63bb97708e5f58ef85f6e4ea 20-39 years old
63bd1b228e5f58ef85f73130 20-39 years old
6423cb1c174e67af0aa0f0fc 20-39 years old
6423d85e174e67af0aa10cda 20-39 years old
6423d8bb174e67af0aa10d98 20-39 years old
...
6549a0df0941d048fdfd94c4 20-39 years old
6549d0ab0941d048fdfd960d 40-59 years old
6549d0ee0941d048fdfd962b 20-39 years old
654b17a20941d048fdfda256 20-39 years old
654d81700941d048fdfdc27d 40-59 years old
Name: age_category, Length: 106, dtype: category
Categories (4, object): ['20-39 years old' < '40-59 years old' < '60-79 years old' < '>= 80 years old']
The distribution of categories in the age_category
column is as follows:
ipdb> df['age_category'].value_counts()
age_category
20-39 years old 89
40-59 years old 14
60-79 years old 3
>= 80 years old 0
Name: count, dtype: int64
The number of subjects in the age category of '>= 80 years old' is 0, which gives me problems in plotting its annotations for the bars.
In general, the code which is below works. My objective is to plot multiple subplots, one for each age category, showing the subject counts for each combination of spirometryResult_category
and habits-smoking
.
# Getting colours as specified in the config, for each hue category
# Need to remove this hardcoding when i improve script
colour_map = config['seaborn_colourmaps'][hue_cat]
# Plotting graph
# count refers to param_category counts
plt.subplots(figsize=figsize)
# Not sure why setting axes.labelsize here doesnt
# work
sns.set_context('paper', rc={'font.size':fontsize})
# height=4, aspect=.6,
g = sns.catplot(
data=df, x=param_category, hue=hue_cat, col=col_cat,
kind='count', palette=colour_map, col_wrap=wrap_num,
saturation=1
)
for ax in g.axes:
ax.tick_params(left=False, labelbottom=True)
ax.set_xticklabels(ax.get_xticklabels(), size=fontsize)
# Replacing subplot title if needed
if col_cat in config['seaborn_alt_names']:
new_title = config['seaborn_alt_names'][col_cat]
ax.set_title( ax.get_title().replace(col_cat, new_title), size=fontsize)
# Auto-label bars
for container in ax.containers:
container.datavalues = np.nan_to_num(container.datavalues)
ax.bar_label(container, fmt='%.0f', padding=2)
# In contrast to prev plotting code, despine goes here, as facetgrid
# requires it to be done this way
g.despine(top=True, right=True, left=True)
# Fine adjustment of aesthetics
g.set(yticklabels=[], ylabel=None, xlabel=None)
g.tick_params('x', rotation=90)
# Checking if legend title is needed
legend = False
if 'legend' in plot_info:
legend = plot_info['legend']
if not legend:
g.get_legend().set_title(None)
else:
# If an alternative legend title is specified,
# use that, if not, use the default one
if hue_cat in config['seaborn_alt_names']:
new_title = config['seaborn_alt_names'][hue_cat]
g.legend.set_title(new_title)
# Continuing adjustment of aesthetics
plt.subplots_adjust(hspace=1, wspace=0.3)
g.figure.savefig(filename, bbox_inches='tight')
plt.close()
The output picture is show here:
As you can see, the category of ">= 80 years old" has no subjects, hence for its corresponding subplots, the text "0" is not plotted at all. All other age categories have their corresponding bars and annotations created correctly. For this case, where ">= 80 years old" has no subjects, ax.containers
is an empty list, therefore my for loop using for container in ax.containers:
to annotate cases with 0 counts, does not work.
How do I force seaborn to annotate subplots with 0 counts, in the correct location (automatically decided by seaborn so i dont have to hardcode anything), in this case, where the category has 0 subjects, and ax.containers
is an empty list?
pandas.Series.cat.remove_unused_categories
to remove empty categories before plotting.import seaborn as sns
# sample data
df = sns.load_dataset('titanic')
# add categories
df['age_cat'] = pd.cut(x=df.age, bins=range(0, 91, 10), ordered=True)
# remove unused categories
df['age_cat'] = df['age_cat'].cat.remove_unused_categories()
g = sns.catplot(kind='count', data=df, x='embark_town', hue='sex', col='age_cat', col_wrap=3, height=2.5, aspect=2)
axes = g.axes.flat
for ax in axes:
for c in ax.containers:
ax.bar_label(c, fmt='%.0f', padding=2)
df['age_cat'].cat.remove_unused_categories()