I was doing some EDA, and I observed the following behavior with Seaborn.
Seaborn version: 0.12.2
Matplotlib version: 3.7.1
Input data
import pandas as pd
import seaborn as sns
data = {'Class': [0, 1, 1, 1, 1, 0, 1, 0, 1],
'count': [509, 61, 18, 29, 8, 148, 54, 361, 46],
'greek_char': ['Alpha', 'Alpha', 'Alpha', 'Alpha', 'Beta', 'Beta', 'Beta', 'Beta', 'Beta'],
'value': ['A', 'B', 'D', 'G', 'A', 'B', 'B', 'C', 'C']}
df = pd.DataFrame(data)
Code
fig = sns.FacetGrid(data=df, col="greek_char", hue="Class")
\_ = fig.map_dataframe(sns.barplot, x="value", y="count", dodge=True)
I obtained the following graph:
Here are some inconsistencies:
Notice that Alpha doesn't have C in the dataset, but it appears in the graph.
Alpha A has only Class 0, however, I see both classes in the graph.
Class G and D are missing in the graph.
I would appreciate any help in determining whether this behavior is a bug, expected behavior, or if I am missing something.
If you try running your code using fig.map
instead of fig.map_dataframe
, you'll get the warning, UserWarning: Using the barplot function without specifying 'order' is likely to produce an incorrect plot.
Once I add the order
argument, I get the correct plot.
import pandas as pd
import seaborn as sns
data = {"Class":[0, 1, 1, 1, 1, 0, 1, 0, 1],
"count":[509, 61, 18, 29, 8, 148, 54, 361, 46],
"greek_char":["Alpha"]*4 + ["Beta"]*5,
"value":["A", "B", "D", "G", "A", "B", "B", "C", "C"]}
df = pd.DataFrame(data)
fig = sns.FacetGrid(data=df, col="greek_char", hue="Class")
fig = fig.map_dataframe(sns.barplot,
x="value",
y="count",
order=sorted(df["value"].unique()))
fig.add_legend()