I am encountering difficulties adjusting the Seaborn KDE plot to properly align it with a strip plot in my visualization. Despite attempting various methods, such as modifying bw_adjust and manually scaling the KDE plot, I have not been able to achieve the desired outcome.
I aim to plot the KDE and strip plot for a column in my dataframe to visualize the data distribution as well as visualize the data points with respect to the KDE. The strip plot is also plotted with a hue to categorize the data.
First, if I plot the strip plot alone as follows:
sns.stripplot(data=df, x="values", hue="category")
And, if I plot the KDE plot alone as follows:
sns.kdeplot(data=df, x="values", ax=ax)
However, if I try to combine both as follows:
fig, ax = plt.subplots(figsize=(16, 8))
sns.stripplot(data=df, x="values", hue="category")
sns.kdeplot(data=df, x="values", ax=ax)
For some reason, the KDE plot becomes inverted when plotted with the strip plot and I am not sure why this is happeneing. So I have 2 problems I am trying to solve:
First, I tried to change some parameters in the seaborn stripplot and kdeplot functions as follows:
sns.stripplot(data=df, x="values", hue="category", jitter=0.1, dodge=True, ax=ax)
sns.kdeplot(data=df, x="values", ax=ax, bw_adjust=-1)
And, it appears that setting bw_adjust to a negative number solves the first issue as you can see in the result below (even though I am not sure why this works) :
However, the data points of the strip plot are still not well represented and I am not sure how to scale the KDE plot properly so that it fits well with the strip plot data.
So, to try to solve this issue, I tried plotting the KDE plot manually (without seaborn):
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from scipy.stats import gaussian_kde
df = pd.read_csv("inputs/desired_data.csv")
fig, ax = plt.subplots(figsize=(16, 8))
# Strip plot
sns.stripplot(data=df, x="values", hue="category", jitter=0.1, dodge=True, ax=ax)
# Computing KDE manually
x = df["values"].values
kde = gaussian_kde(x, bw_method=1)
x_grid = np.linspace(x.min(), x.max(), 1000)
y = kde.evaluate(x_grid)
# Normalize and scale the KDE plot
y_scaled = y / y.max() # Example normalization, adjust as needed
# Plot the scaled KDE
ax.plot(x_grid, y_scaled, color='blue')
And, with this code, I get the following plot:
So I got a better results in terms of visualizing the data points with stripplot. However, I got an inverted KDE plot again. And, if I change bw_method=1
to bw_method=-1
in kde = gaussian_kde(x, bw_method=1)
, I get the following plot:
So, the results get worse again. Ideally, I would like to obtain a plot as follows:
You are trying to plot a somewhat unusual combination. The stripplot
inverts the y-axis, and sets the limits between -0.5
and 0.5
. The kdeplot
sets the minimum of the y axis to 0 (so it "sits" on the x-axis), and the height is such that the area under the curve is normalized to be 1
.
The easiest approach would be to use a twin axis. By drawing the kdeplot
on the original axis, the left y-axis will show the height of the kdeplot. stripplot
can be on the twin axis, with an independent y.
Here is some example code:
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset('iris')
fig, ax = plt.subplots()
sns.kdeplot(iris, x='sepal_width', ax=ax)
ax2 = ax.twinx()
sns.stripplot(iris, x='sepal_width', hue='species', ax=ax2, palette='turbo', dodge=True)
plt.tight_layout()
plt.show()